About Evaluation Scripts

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

Apache License 2.0

1.74k stars 231 forks source link

Hi, I'm having difficulty submitting the results to the leaderboard, possibly due to the bug reported at https://github.com/embeddings-benchmark/mteb/issues/774.

So, I tried using "https://github.com/embeddings-benchmark/mteb/blob/main/scripts/merge_cqadupstack.py" to merge the 12 results of cqadupstack, and used https://github.com/embeddings-benchmark/mtebscripts/blob/main/results_to_csv.py to get the avarage scores for each task. Does this produce exactly the same scores as listed on the leaderboard? It seems the numbers of datasets match the ones reported in a paper (56 data sets in total).

I think it's nice to have some code/instruction for getting the final scores locally if it's just a matter of averaging scores stored in a result folder.

embeddings-benchmark / mteb

About Evaluation Scripts #857