embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.61k stars 211 forks source link

Add support for the Bright benchmark #971

Open swj0419 opened 1 week ago

swj0419 commented 1 week ago

Checklist for adding MMTEB dataset

Reason for dataset addition: (Bright dataset, https://huggingface.co/datasets/xlangai/BRIGHT)

xiamengzhou commented 1 week ago

Thanks @swj0419 !

I saw many benchmarks have separate task files for each subtask, e.g., CQA in this page. Conceptually, do you think we should maintain one BRIGHT page or multiple ones?

Also, would you suggest that we separate standard and long settings to two files, given that the main_score is different? We use ndcg_at_10 for the standard setting, and recall_at_1 for the long setting.