Yomguithereal / clj-fuzzy

A handy collection of algorithms dealing with fuzzy strings and phonetics.
http://yomguithereal.github.io/clj-fuzzy/
MIT License
262 stars 27 forks source link

Fix Jaro-Winkler result for two empty inputs #44

Open mhuerster opened 7 years ago

mhuerster commented 7 years ago

Resolves https://github.com/Yomguithereal/clj-fuzzy/issues/43

After trying a few other libraries (https://github.com/kiyoka/fuzzy-string-match and https://github.com/tonytonyjan/jaro_winkler), I've noticed that it seems more common to return 0 for two empty inputs than 1.

This PR implements and tests that behavior.

Yomguithereal commented 7 years ago

Hum. That's quite strange. Conceptually, two empty strings are identical, so the similarity between them should be 1 not 0. I don't remember but I think my library computes the Jaro-Winkler similarity, not distance. The library you cite computes the distance here and returns correctly 0 distance between two empty strings, for instance https://github.com/tonytonyjan/jaro_winkler/blob/master/test/test_jaro_winkler.rb#L42

lenaschoenburg commented 7 years ago

Doc strings say they compute distance, not similarity

Yomguithereal commented 7 years ago

@dignati you are right. There seems to be some kind of confusion in the doc strings. The function do compute the similarity, not the distance. My bad.