Open Muennighoff opened 4 months ago
Hi, thanks for reaching out. We'd be happy to have it included in MTEB. For more details you can refer to our paper, https://arxiv.org/pdf/2402.14151
Two things to note. First, each of the task included in BIRCO has a unique objective (i.e. instruction), we give out a reference version of the task-specific complex objective, in our paper's appendix B1-B5. These objectives are crucial to instruct LLM and embedding models to understand the task objective, without having to fine-tune the model.
Secondly, our dataset DORIS-MAE is scientific query passage reranking dataset. For a given query, each paper abstract in the candidate pool receives a non-binary, continuous score between 0-2. Typically, we use 1 as the cutoff for determining relevance. See our previous paper https://neurips.cc/virtual/2023/poster/73559 for more clarification.
We are also happy to provide more clarifications and assistance, you can contact us from our email listed in our papers.
Great thanks so much for the info! In case you have bandwidth to open a PR for integration, that'd be amazing else maybe someone in the community might tackle it so I've opened an issue: https://github.com/embeddings-benchmark/mteb/issues/818 😊
Cool work! It'd be great to have it integrated in MTEB (https://github.com/embeddings-benchmark/mteb) if you're interested :)