facebookresearch / InferSent

InferSent sentence embeddings
Other
2.28k stars 471 forks source link

Cosine similarity FastText worse results bug #90

Closed deanott closed 5 years ago

deanott commented 5 years ago

I was comparing the scores for a simple cosine similarity between FastText and Glove.

I stumbled upon thee fact if I set Version: 1 in fastText I get better scores again?

def cosine(u, v): return np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v)) cosine(model.encode("yes"),model.encode("cat")) Output FastText version 1: Cosine scores: ('cat', 0.36306605) Output FastText version 2: Cosine scores: ('cat', 0.71227503)

This versioning change can however cause KeyError: </s> bugs on certain sentences.

I would assume FastText should get better scores for similarity because of the benchmarks. My guess is some tokenizing bug in models.py may be adding additional padding to increases the similarity score. Just investigating now but thought I would open an issue first to check.

aconneau commented 5 years ago

Hi, infersent1 was trained with GloVe vectors and infersent2 was trained with fastText, so you cannot use fastText vectors with infersent1.

deanott commented 5 years ago

Hi, thanks for the reply, I was using infersent2 with fasttext for this. It is only when I change the version number in the parameters I get the previous result differences

aconneau commented 5 years ago

The version is linked to the model. If you use infersent1, you should use version 1, and if you use infersent2 you should use version 2. So if you use version 1 with infersent2 and fastText vectors, it's not surprising that this will not work as we've modified the way padding was done in version2.

deanott commented 5 years ago

Yup, that's what I thought and was confused when I changed only the versioning the model got the better results. Is it just coincidence that I get better cosine scores showing the words yes and cat are different when I change the versioning parameter?

aconneau commented 5 years ago

I'm not sure what you mean by "better results". Are you only looking at the cosine similarity between "yes" and "cat"? This is hard to conclude anything from this example. For evaluating the quality of the cosine metrics in the embedding space in a systematic manner, please refer to the "STS" tasks provided in SentEval.

https://github.com/facebookresearch/SentEval#downstream-tasks

deanott commented 5 years ago

Apologies for being unclear and not replying sooner! I make the assumption that higher values of cosine distance scores means that the words are semantically close, this being any value above 0.7.

If I compared words like 'Yes' to a list such as include yeah, advert, television, crying, they all score over 0.7. However, when I used version: 1 I get better results to my assumption, all the words cosine scores drop below 0.5 while yeah stays nearly the same. I have also done a test on longer comparing longer sentences, for examplei like cake and i am a cake and the problem persists.

I tested changing the version number on the more systematic tests with SentVal and the infersent. Came out as higher Pearson and Spearman scores when the version was set correctly to 2.

The problem was with my my test sample and although that performs better with a version param change, that is probably not the case generally according to the STS tasks. I also believe adding a classifier on top for the type of tests samples would get my desired results. Will close the issue now and thanks.