Closed aminst closed 3 months ago
@tomaarsen what are your thoughts on adding this to the leaderboard? My guess is that almost all models would use cosine sim. in which case it wouldn't add much information
@KennethEnevoldsen I do think it makes sense to show this in the leaderboard for all tasks - I think we currently only say it for STS:
Metric: Spearman correlation based on cosine similarity
But the other tasks primarily (exclusively?) use Cosine Similarity too. There are some models/tasks that perform a bit better with (non-normalized) dot
as it prefers longer passages, but they're few and far between & not high on the leaderboard.
From my understanding, @aminst refers to the intended distance metric of the model itself (@aminst do correct me if I am wrong) and not the task?
However, I do agree that a model might have been trained with a different metric in mind, and assuming a distance metric seems problematic. I would ideally allow the model to supply the distance metric and then we just report the score (e.g. spearman correlation) for whatever distance metric the model selects.
@KennethEnevoldsen Yes, that is exactly what I meant. It would be great if the leaderboard also shows the distance metric the model used during training. It would also help people to not misuse the embeddings with a different metric. The use case I have in mind is the following, does it make sense?
Ohh, I see! Yes, that would indeed be optimal. I realised something similar with Sentence Transformers, so in Sentence Transformers v3 it will be possible to configure the similarity function in the model configuration. This will then be used when calling the new SentenceTransformer.similarity
or SentenceTransformer.similarity_pairwise
methods.
Additionally, ST models will start reporting their similarity function in the model card automatically, e.g. here.
That should help, at least with ST-based models.
It sounds like this is something that we might consider adding after the additions to ST3. I will leave the issue open, but atm. we probably won't add it in.
I have added an issue related to using a custom sim. within the benchmark, but for the similarity of the model we will probably leave that to the model card.
edit: will close for now, but feel free to re-open the discussion if you believe that there is more to add.
Hi, thanks for this awesome benchmark.
Is it possible to add the similarity metric used in each model in the benchmark? From what I understand, the choice of what similarity metric is used in each model influences what similarity metric people should use when storing the generated embeddings in a vector database for later similarity searches. I believe this would help people to easily choose what similarity metric to use when storing the embeddings. I can help and add this if it's valuable. Thanks!