Add support for the Bright benchmark - Githubissues

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

https://arxiv.org/abs/2210.07316

Apache License 2.0

1.61k stars 211 forks source link

Add support for the Bright benchmark #971

Open swj0419 opened 1 week ago

swj0419 commented 1 week ago

Checklist for adding MMTEB dataset

Reason for dataset addition: (Bright dataset, https://huggingface.co/datasets/xlangai/BRIGHT)

[x] I have tested that the dataset runs with the mteb package.
[x] I have run the following models on the task (adding the results to the pr). These can be run using the mteb run -m {model_name} -t {task_name} command.
- [x] sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- [x] intfloat/multilingual-e5-small
- [x] I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
- [x] I have considered the size of the dataset and reduced it if it is too big
- [x] Run tests locally to make sure nothing is broken using make test. (NOTE: make test didn't work for me with error (pytest: error: unrecognized arguments: -n), but pytest --durations=5 passes)
- [x] Run the formatter to format the code using make lint.
- [ ] I have added points for my submission to the POINTS.md file. (i will add these information when the arxiv version of our pape is out)

xiamengzhou commented 1 week ago

Thanks @swj0419 !

I saw many benchmarks have separate task files for each subtask, e.g., CQA in this page. Conceptually, do you think we should maintain one BRIGHT page or multiple ones?

Also, would you suggest that we separate standard and long settings to two files, given that the main_score is different? We use ndcg_at_10 for the standard setting, and recall_at_1 for the long setting.