Closed christophertull closed 6 years ago
I saw that there is a fuzzywuzzy
equivalent for R (https://github.com/mlampros/fuzzywuzzyR) for that.... I can try to do the merge, but I'm not sure where I can find the right supplier_status_table
.
Yeah my bad, I need to export it from our database still. I will get that to you ASAP
Good find on the fuzzywuzzyR
!
Can we close this? Believe this functionality has been integrated.
I would propose linking the location of the matching code for posterity and closing
This is done here. Most of the fuzzy matching performs well automatically. I double checked Victor's great work and all looks well. I further "automated" things by taking those file that did not match on the first pass and manually adding a mapping list for these special cases.
We would like to match the OWRS files (more specifically the dataframe of the OWRS utilities
df_OWRS
) with the utilities in thesupplier_status_table
. The names will probably not match exactly because of issues like spacing, plurals ("utility" vs. "utilities") and word order ("City of X" vs "X City of").Because of this, we will need to do some sort of fuzzy matching to join the files. e.g. first pass can be regexes and stuff, but might need something fancier like string similarity. Can be in
R
to match the main body of the analysis but could also be inpython
since this step is sort of stand-alone. If in python could utilize thefuzzywuzzy
package, not sure about comparable packages inR
.