MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
733 stars 67 forks source link

one problem: some words in to_list would be changed to none #10

Closed peilibo closed 3 years ago

peilibo commented 3 years ago

from polyfuzz import PolyFuzz from_list = ["chanel"] to_list = ["chanel"] model = PolyFuzz("TF-IDF") model.match(from_list, to_list) res = model.get_matches()

result

 From    To  Similarity

0 chanel None 0.0

why to_list is changed to none? besides TF-IDF method, other methods also have the same problem

MaartenGr commented 3 years ago

That is actually a feature of PolyFuzz. The reason for that is when you use two identical lists, it assumes you intend to compare a list of strings with itself. If that is the intention, then you do not want strings to be mapped to itself.

In your case, if you simply add a single word to one of the lists, then you will get the results you are looking for:

from polyfuzz import PolyFuzz
from_list = ["chanel"]
to_list = ["chanel", "another_word"]
model = PolyFuzz("TF-IDF")
model.match(from_list, to_list)
res = model.get_matches()

Result

From    To  Similarity
0   chanel  chanel  1.0
peilibo commented 3 years ago

OK ,I got it. thanks