aceakash / string-similarity

Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
MIT License
2.53k stars 128 forks source link

Matching % seems incorrect #91

Closed oom- closed 3 years ago

oom- commented 3 years ago

I just tryed the example:

stringSimilarity.compareTwoStrings("healed", "sealed");
//0.8

=> 80% for a 1 letter change.

stringSimilarity.compareTwoStrings("healed", "ehaled");
//0.6

=> 60% for 2 letter switching

Ok I get it but now I just try with another word that contains 1 letter less (5 char length vs 6)

stringSimilarity.compareTwoStrings("fuira", "fuia");
//0.57

=> 57% for a 1 letter change (just lost 23%)

stringSimilarity.compareTwoStrings("furia", "fuira");
//0.25

=> 25% for a 1 letter change (just lost 35%)

Seems to me that less the string is long more the matching is severe. Is there a way to make it "average" undepending of the length ?

aceakash commented 3 years ago

@oom- Different algorithms will have different trade-offs. This library implements the Sørensen–Dice_coefficient as the similarity score. I would encourage you to try out other string comparison algorithms to see which one best fits your needs. This might be a good starting point.