Allow ability to pass in list of company name suffixes to be stripped after your current preprocessing step.

DeNederlandscheBank / name_matching

Other

133 stars 44 forks source link

Many companies in the same domain have common suffixes...
For e.g. in the high tech companies, many companies have words like

systems, technology, technologies, tech etc. buried in them. Removing this will help the matching later.

For e.g. currently, I have Cisco Systems in the matching data, my string to be matched is Cisco, but the matched score is only 37%. If I can preprocess "Cisco Systems" to "Cisco", I think the match score will be higher.

I think we just need another parameter, in the name_matcher constructor to pass in a custom set of words that will be used in the stripping after the punctuations, white spaces etc. have been removed.

DeNederlandscheBank / name_matching

Allow ability to pass in list of company name suffixes to be stripped after your current preprocessing step. #23