Performance of the pretrained model

Hi @Kyubyong, I started to report the maximum for the scores cosine/manhatten/euclidean/dot-product. I'm sorry if somewhere is still mentioned that the reported scores in the readme are from consine-similarity.

Cosine / Manhatten / Euclidean / Dot-Product are computational wise quite comparable, i.e., if I have an unsupervised task like semantic search, then, from the computational overhead, it does not really matter if I use cosine similarity, Manhatten distance, Euclidean distance, or dot product. The computation is comparable (sometimes equivalent) and for each metric, efficient index structures can be created.

For most sentence embeddings methods, the choice of Cosine/Manhatten/Euclidean/Dot makes no large difference and scores are comparable. But for some sentence embeddings methods, it makes a big difference.

For example when I used the XLNet methods, I got quite bad scores with cosine-similarity, about 20 percentage points lower than with Manhatten distance.

In order to eliminate the impact of the distance function, I think it is better to test with several functions (cosine/manhatten/eucl.) and to see what works for the selected sentence embeddings methods. In most cases, the differences are not that large. But in same cases, it can play a big role.

Best regards Nils Reimers

UKPLab / sentence-transformers

Performance of the pretrained model #49