Fix for issue: Weird results with nnSent (#10)

epfml / sent2vec

General purpose unsupervised sentence representations

Other

1.19k stars 256 forks source link

By default the compare operator for the priority queue with pairs first compares the first elements of the pair. If those elements are the same, the second elements of the pair are compared. In your implementation this doesn't make sense, because when predicting nearest neighbor sentences for a query out of the corpora you would like to only rely on the similarity score and not on the real sentences. It's very often the case that the similarity score is the same for the dot product between query and corpora sentences. But that's ok. Only all of them need to be returned.

Thus, I suggest to use a custom comparator which only compares the first elements of the pair, i.e. the similarity scores.

epfml / sent2vec

Fix for issue: Weird results with nnSent (#10) #13