RobinL / fuzzymatcher

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
MIT License
281 stars 60 forks source link

Readme should explain meaning of scores #45

Open soliverc opened 5 years ago

soliverc commented 5 years ago

On what scale are the matches scored?

I noticed with fuzzymatcher.fuzzy_left_join my best_match_scoreranges from -0.7 to + 1.15.

What is the highest possible score in this case? Can it go higher than 1.15?

Usually for fuzzy matching I would have a cutoff of around 0.8 or 0.9., which is on a scale of 0 to 1.

soliverc commented 5 years ago

I ran the package again today and scores range from -1.4 to +2.5. I can't figure it out!

ghost commented 4 years ago

I agree, not sure how to read the scores.

Kreisash commented 4 years ago

Just to reiterate that it would be great to get an idea of what the scores mean so that a comparison could be made between various matching algorithms/libraries. I find this library vastly quicker for large data sets so it's shame that this is one of the main drawbacks.

bobcolner commented 1 year ago

how can we change the scorer to return the true probability of a match??