Yomguithereal / clj-fuzzy

A handy collection of algorithms dealing with fuzzy strings and phonetics.
http://yomguithereal.github.io/clj-fuzzy/
MIT License
261 stars 27 forks source link

Jaro-Winkler returns unexpected values for two nil inputs #43

Open mhuerster opened 7 years ago

mhuerster commented 7 years ago

Is this intended behavior?

user=> (require '[clj-fuzzy.jaro-winkler])
nil
user=> (clj-fuzzy.jaro-winkler/jaro-winkler nil nil)
0.4
Yomguithereal commented 7 years ago

Hello @mhuerster. I guess it's not. Do you think it should throw or do you think it should return 1? Broadly speaking, should nil be considered as an empty string or should we refuse to take it as a valid argument?

mhuerster commented 7 years ago

Thanks for responding, @Yomguithereal. I think either of those is reasonable. As a user, I'd expect consistency with the other functions exposed in clj-fuzzy.metrics, most (though not all) of which seem to treat nil as the empty string:

(use '[clj-fuzzy.metrics])
(levenshtein nil nil) ; => 0
(levenshtein "" "")   ; => 0

(dice nil nil) ; => 1.0
(dice "" "")   ; => 1.0

(sorensen nil nil) ; => 1.0
(sorensen "" "")   ; => 1.0

(mra-comparison nil nil) ; => NullPointerException
(mra-comparison "" "")   ; => {:minimum 5, :similarity 6, :codex ["" ""], :match true}

(jaccard nil nil) ; => ArithmeticException Divide by zero
(jaccard "" "")   ; => ArithmeticException Divide by zero

(hamming nil nil)  ; => 0
(hamming "" "")    ; => 0

(jaro nil nil) ; => 0
(jaro "" "")   ; => 0

(tversky nil nil) ; => ArithmeticException Divide by zero
(tversky "" "")   ; => ArithmeticException Divide by zero
Yomguithereal commented 7 years ago

Consistency would be good indeed. I don't have much time right now to dedicate to this issue but I'll gladly accept a pull request.

mhuerster commented 7 years ago

Great, thanks! I'll take a stab at it if I have some free time this/next week.