embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.92k stars 259 forks source link

How to add a new leaderboard? #1358

Open shizhl-code opened 2 days ago

shizhl-code commented 2 days ago

Hi, it is a really amazing work. We have collected a new benchmark in EMNLP 2024. We want to combine our benchmark into MTEB.

I noticed that this link give the procedure https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_leaderboard_tab.md But I still confused about the details. Could you please give me more details on how to add leaderboard into MTEB?

Samoed commented 2 days ago

You should integrate your benchmark in mteb by creating tasks. Then add results to results repo and change leaderboard

x-tabdeveloping commented 2 days ago

Can we see a paper? If you wanna add it, I would open a PR, where you add all tasks, and then also add a new Benchmark object to benchmarks.py. You should also add results to the PR in the results folder. We're currently finishing up a leaderboard 2.0. If your PR gets merged it will automatically show up in the new leaderboard.

shizhl-code commented 2 days ago

Hi,

Thank you for your response! I’m excited to share that our paper has been accepted at EMNLP this year and is now available at https://arxiv.org/abs/2410.10127.

Should I open a PR for the repository https://github.com/embeddings-benchmark/leaderboard or https://github.com/embeddings-benchmark/mteb?

Looking forward to your guidance!

Best regards, Zhengliang

Márton Kardos @.***> 于2024年10月30日周三 01:59写道:

Can we see a paper? If you wanna add it, I would open a PR, where you add all tasks, and then also add a new Benchmark object to benchmarks.py. You should also add results to the PR in the results folder. We're currently finishing up a leaderboard 2.0. If your PR gets merged it will automatically show up in the new leaderboard.

— Reply to this email directly, view it on GitHub https://github.com/embeddings-benchmark/mteb/issues/1358#issuecomment-2444982191, or unsubscribe https://github.com/notifications/unsubscribe-auth/BK3VT7W4QDITHV4FWWBCSO3Z57EILAVCNFSM6AAAAABQ2BOLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBUHE4DEMJZGE . You are receiving this because you authored the thread.Message ID: @.***>

Samoed commented 2 days ago

You should open PR in https://github.com/embeddings-benchmark/mteb

KennethEnevoldsen commented 2 days ago

Congratulations on the publication @shizhl-code! (@orionw you might find this interesting)

Just to sum up the comments. I would recommend:

1) create an issue with a checklist pr. dataset "New benchmark: MAIR" 2) Add each dataset in a separate PR (after the first one this should be fairly quick). There is a guide here 3) Once all datasets have been added we can add the new benchmark here 4) Once that is through we can add a leaderboard tab (we might have the next version of the leaderboard done before then in which case this will happen automatically)

I imagine you can take inspiration from this PR. Notably for retrieval tasks, the data format can be a bit tricky.

orionw commented 1 day ago

Thanks for the ping @KennethEnevoldsen and congratulations @shizhl-code! Would be excited to see this added! Feel free to ping me in the issue with questions

shizhl-code commented 1 day ago

Thank you a lot for your suggestion! I will create an issue and update my procedure accordingly.

Best, Zhengliang

Kenneth Enevoldsen @.***> 于2024年10月30日周三 16:55写道:

Congratulations on the publication @shizhl-code https://github.com/shizhl-code @.*** https://github.com/orionw you might find this interesting).

Just to sum up the comments. I would recommend:

  1. create an issue with a checklist pr. dataset "New benchmark: MAIR"
  2. Add each dataset in a separate PR (after the first one this should be fairly quick). There is a guide here https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md
  3. Once all datasets have been added we can add the new benchmark here https://github.com/embeddings-benchmark/mteb/blob/main/mteb/benchmarks/benchmarks.py
  4. Once that is through we can add a leaderboard tab (we might have the next version of the leaderboard done before then in which case this will happen automatically)

I imagine you can take inspiration from this PR https://github.com/embeddings-benchmark/mteb/pull/1308. Notably for retrieval tasks, the data format can be a bit tricky.

— Reply to this email directly, view it on GitHub https://github.com/embeddings-benchmark/mteb/issues/1358#issuecomment-2446227363, or unsubscribe https://github.com/notifications/unsubscribe-auth/BK3VT7XBOPGCDS4DPYBGIKDZ6CNJTAVCNFSM6AAAAABQ2BOLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBWGIZDOMZWGM . You are receiving this because you were mentioned.Message ID: @.***>