Open luffycodes opened 2 years ago
We currently work on to create a nice benchmark to compute these scores.
The 6 datasets from semantic search are from BEIR: https://github.com/beir-cellar/beir
Thanks a lot !
Just curious, what the other 14 datasets are? Are they STS12, etc?
Only one of these is STSbenchmark, as STS is a horrible way to evaluate embedding models.
The others come from different domain and tasks (clustering, retrieval, duplicate detection)
@nreimers can you share the names of the 6 datasets from BEIR?
@nreimers would also be very interested in the names of the 6 datasets from BEIR as we want to reproduce some results and compare other models on the same benchmark :)
hello, is there a script to run the evaluation as mentioned on the website: https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models what were the 14 datasets used to measue the performance of sentence embeddings and 6 datasets for semantic search?