Closed CMCDragonkai closed 6 years ago
DW, I just found that the fuzzy wuzzy metric is not symmetrical on the order.
fuzzy_distance('sma solar technology ag', 'Sungrow Power Supply Co Ltd') # 76
fuzzy_distance( 'Sungrow Power Supply Co Ltd', 'sma solar technology ag') # 72
I'm attempting to use
pybktree
for string similarity.But compare with this:
The reason I'm checking for something with 0 distance is to figure out whether a string exists in the tree or not. I need to prevent duplicate submissions.
I'm also finding I need the ability to prune the tree, by removing strings that were already added.