Closed matthias-herrmann closed 4 years ago
You probably need to normalize your embeddings.
@mpagli I have already done that, maybe it's the way of words are treated which are not part of the vocabulary of the model or there is something else wrong. I need to do some further research on that
If the word is not in the vocab sent2vec will give you an empty vector with 0 norm. This might be the trigger. Maybe check if the word is in the vocab before, or check if the norm is zero.
Calculating the centroid for some sentence vectors and then calculating the cosine similarity of the centroid and another sentence is returning a cosine similarity above 1. I'm using the pretrained
sent2vec_wiki_bigrams
model.Is there a way that I get only values between 0 and 1?