embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.76k stars 231 forks source link

Integrate JMTEB #749

Open Muennighoff opened 3 months ago

Muennighoff commented 3 months ago

Would be great to have the datasets in JMTEB (https://github.com/sbintuitions/JMTEB) integrated into MTEB for those which aren't yet already, so we can also add a Japanese leaderboard sometime 😊 cc @lsz05 @ryokan0123 @masaya-ohagi & anyone who may be interested

awinml commented 3 months ago

I would like to help integrate JMTEB. The following datasets from JMTEB are already present in MTEB:

The following ones would need to be added:

AlexeyVatolin commented 3 months ago

@Muennighoff, I can take over the integration of ESCI. I have already looked at the contents of this dataset. In the original dataset https://huggingface.co/datasets/tasksource/esci there are three languages: English, Spanish and Japanese. Do we want to add all three languages or just Japanese?

Muennighoff commented 3 months ago

@Muennighoff, I can take over the integration of ESCI. I have already looked at the contents of this dataset. In the original dataset https://huggingface.co/datasets/tasksource/esci there are three languages: English, Spanish and Japanese. Do we want to add all three languages or just Japanese?

I think if we can add all 3 that'd be even better! 🙌