embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.74k stars 231 forks source link

About Evaluation Scripts #857

Open twadada opened 2 months ago

twadada commented 2 months ago

Hi, I'm having difficulty submitting the results to the leaderboard, possibly due to the bug reported at https://github.com/embeddings-benchmark/mteb/issues/774.

So, I tried using "https://github.com/embeddings-benchmark/mteb/blob/main/scripts/merge_cqadupstack.py" to merge the 12 results of cqadupstack, and used https://github.com/embeddings-benchmark/mtebscripts/blob/main/results_to_csv.py to get the avarage scores for each task. Does this produce exactly the same scores as listed on the leaderboard? It seems the numbers of datasets match the ones reported in a paper (56 data sets in total).

I think it's nice to have some code/instruction for getting the final scores locally if it's just a matter of averaging scores stored in a result folder.

Muennighoff commented 2 months ago

That script should work & correspond to the LB; I've added a simpler script here: https://github.com/embeddings-benchmark/mteb/pull/858 - would be great if you could take a look and then we can merge it if you think it's helpful :)

https://github.com/embeddings-benchmark/mteb/issues/774 does not prevent submitting to the LB, it just makes the refresh not work, but we can always restart the space to include your scores.