Yomguithereal / clj-fuzzy

A handy collection of algorithms dealing with fuzzy strings and phonetics.
http://yomguithereal.github.io/clj-fuzzy/
MIT License
262 stars 27 forks source link

dice algorithm NaN #25

Closed Globegitter closed 9 years ago

Globegitter commented 9 years ago

Just trying out this algorithm and seems the dice algorithm has some minor bugs (or I am not understanding it quite right): screen shot 2014-11-13 at 09 16 01

These are the results I am getting with strings of length 0 and length 1, could this have anything to do with the input being characters rather than actual strings? Is that as intended?

Yomguithereal commented 9 years ago

Hello @Globegitter, Well this is rather unfortunate. I'll give it a look as soon as possible. This bugs applies to the 0.1.9 version or the 0.1.8?

For other cases, does the algorithm work correctly?

Globegitter commented 9 years ago

@Yomguithereal It applies to both. Otherwise it seems to be working really well - thanks for the library, it is really useful.

Yomguithereal commented 9 years ago

@Globegitter, I've checked this and can affirm the bug comes from the clojure part and therefore replicates into its JavaScript counterpart.

I can fix it but I have a problem here and you might be able to help me:

The Dice coefficient works using bigrams. So, traditionnally, if you compare h and h, this will return 0, which is a total nonsense since both strings are the same.

So here is the choice I have to make:

Any opinion?

Yomguithereal commented 9 years ago

I've fixed the implementation. You can install the latest dev version with the following command for node if needed:

npm i git+https://github.com/Yomguithereal/clj-fuzzy.git
Globegitter commented 9 years ago

Oh that is great thank you! How did you resolve it then?

Yomguithereal commented 9 years ago

Second choice. I found other libraries - in python notably - that prefer to fix the rationale of the algorithm. So I went with that so now h / h --> 1.0.

Globegitter commented 9 years ago

Awesome thank you, will test asap.