MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
733 stars 67 forks source link

Similarity score comes out 0 for exact matches #19

Closed ektaatomar closed 3 years ago

ektaatomar commented 3 years ago

The TF-IDF and RapidFuzz both return 0 score for exact matches.

APL | APL -> 0.000

MaartenGr commented 3 years ago

Could you share the code for reaching that output? Also, which version of PolyFuzz are you using?

ektaatomar commented 3 years ago

Polyfuzz version 0.3.0. I can't share the exact code I used but I haven't done anything extra, using code snippets similar to:

from polyfuzz.models import TFIDF tfidf = TFIDF(n_gram_range=(3, 3), min_similarity=0) model_ngram = PolyFuzz(tfidf).match(from_list, to_list) model_ngram.get_matches().tail(50)

and I tired with tf-idf without ngrams and rapidfuzz with partial ratio and all of them gave me more 0 scores for exact matches than 1 score.

I am passing one names pair at a time in from and to list and getting the scores. Basically using the DF lambda function to apply it because I already have figured out best possible pair sets.

MaartenGr commented 3 years ago

Ah, I know what is happening! Since you are passing the same lists, the model thinks you want to compare a list with itself and it ignores pairings with the same index. You can solve this by simply adding a random word to one of the lists.

ektaatomar commented 3 years ago

It worked. Thanks!