Closed torhaa closed 6 years ago
You are right :smile:
I implemented it from a theoretical point of view where it is fine to state that two vectors with length = 0 are equal. However, you are right that in the current use case this definition does not make much sense. Thanks for the fix :+1:
I encountered this problem while evaluating fast-text clusters for a corpus of a million articles from nrk.no. I fixed it by redefining cosine similarity for the case when both term frequency vectors are length 0. Cosine similarity for two 0 length vectors is now set to 0. Clusters with no similarity now get score 0.