bcgov / name-examination

Service BC Name Examination System
4 stars 36 forks source link

Confirm NE Mainframe Auto-Examination/Searching Logic #348

Closed LJTrent closed 6 years ago

LJTrent commented 6 years ago

Task (Use for Work not Directly related to a Story but supports the Sprint Goals)

Detailed Description

Review my existing logic for doing each part of the names examination part against what the mainframe is currently doing to confirm our understanding and guide what we may build using SOLR search engine.

Sprint Goal

Support the development of auto-examination process.

Acceptance Criteria

Definition of Done

Issue Checklist
LJTrent commented 6 years ago

Reviewed the mainframe code with Bob and have determined the pieces that we can include in the SOLR queries. The components the mainframe accomplishes includes steps that prepare the name for searching and inserting the prepared name into compname and compnam1 tables in COBRS.

The rules to create a compressed key for a company name (input) -Takes the input name and removes 'THE' -Take that and replaces the following: BRITISHCOLUMBIA =>BC '#' => NUMBER & => AND Replaces digits 0-9 with words: 0 =>ZERO 1 => ONE 2 => TWO 3 =>THREE 4 => FOUR 5 => FIVE 6 => SIX 7 => SEVEN 8 => EIGHT 9 => NINE

Output =>compressed_key for a company name. Need to confirm why this is done.

Routine to Insert into Compname -Remove 'AND' Substitutions for Special Characters: -B.C.S B.C. S, B. C. S, BCS => BC -Remove &, space, -,/ -substitute for #,$,cent symbol, , 0-9 hash Symbol =>NUMBER $ => DOLLAR cent symbol =>CENT 0-9 digitis for words remove % remove double blanks (space space) substitute BRITISH COLUMBIAS => BC BRITISH COLUMBIA => BC BRITISH COLUMBIAN =>BC BRITISH COLUMBIANS => BC

Remove NON PERSONAL LIABILITY Remove N P L, NP L Remove IN VOLUNTARY LIQUIDATION Remove OF CANADA Remove OF BC

-Build a word array from the prepared name -determine number of words in the prepared name -Remove last word or phrase if it is: ASSOCIATION, ASSOC, ASSN, COMPANY, CO, CORPORATION, CORP, INCORPORATED, INC, INCORPOREE, LIABILITY, LIMITED, LTD, LIMITEE, LTEE, SOCIETY, SOC -Remove duplicate words -Remove these words after the first word, PLUS, AMPERSAND, AND, OF, THE, TO -Concatenate single letter words -Concatenate special phrases: ->AIR + CONDITION (side by side words) => AIRCONDITION ->AUTO + BROKER,COURT, SALE =>AUTOBROKER , AUTOCOURT, AUTOSALE ->BED + BREAKFAST =>BEDBREAKFAST ->BOWEN + ISLAND =>BOWENISLAND ->CAMPBELL + RIVER =>CAMPBELLRIVER -> ... I dont think we need to concatenate these words. -Remove BC if it is the last word -Some additional manipulation of special phrases but not sure why , EX, substituting C, K, NEWWEST, NEWWESTMINISTER, PACIFICCOAST - I dont think we need any of these concatenations. I will verify with Kaine about the business logic. I believe this was done to facilitate the search algorithm on the mainframe and we wont require it in SOLR.

It also looks like we will not have to bring CPRD data for active incorporations into the names examination database but we will have to build a solr index based on that cprd data,. How we do this dynamically still needs to be determined. We only need corporate registry for active companies and we will need corp num, corp type, date of incorpration, name, designation, directors, director cities.

We will need to understand when to trigger the updates to the corp index in SOLR with a direct connection to the oracle CPRD database,