embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.97k stars 274 forks source link

feat: add CUREv1 retrieval dataset #1459

Closed dbuades closed 1 day ago

dbuades commented 1 week ago

CUREv1

Over the past year, we’ve worked closely with medical professionals to develop this dataset, which we’re now sharing with the community to support research in point-of-care information retrieval—a critical daily task for many practitioners.

The CURE is a cross-lingual retrieval dataset organized into:

Each split and cross-lingual setting is composed of 200 natural language queries formulated by health care professionals, capturing their information needs when consulting academic literature during their daily work.

The corpus is constructed from an index of English passages extracted from biomedical academic articles. Passages are then marked as either Highly Relevant, Partially Relevant or Not Relevant with respect to each query.

For more details, please check the Dataset Card in the Hub 🤗 A preprint detailing the curation process and providing an extended rationale will soon be published on arXiv, along with pre-embedded indexes for several of these models!


MTEB(Medical)

At the same time, we take the opportunity to introduce a specialized benchmark that groups MTEB tasks relevant to the medical domain. Initially, we’ve included the following tasks, but we welcome any suggestions for additional tasks you think may be valuable:

We have also computed results for these tasks across 18 open-source models. We can upload them to the results repo or somewhere else, please point us in the right direction as there seems to be lots of activity with the new leaderboard! 💪


Adding datasets checklist

isaac-chung commented 2 days ago

Thanks for the PR! The only thing left to point out is that it'd be great if the PR description can be updated to reflect the changes made above. Otherwise I think this is good to merge. If you end up promoting this on socials, let us know :)

dbuades commented 23 hours ago

Thanks for the PR! The only thing left to point out is that it'd be great if the PR description can be updated to reflect the changes made above. Otherwise I think this is good to merge. If you end up promoting this on socials, let us know :)

Thanks, @isaac-chung ! Sorry I missed your last comment. I’ve updated the PR description retroactively and am currently running the 18 models on all the newly added tasks in the benchmark. Once that’s done, I’ll open a PR with the results.

As for promoting the work, do you have any specific ideas in mind? I was planning to post something on LinkedIn next week, which I can also share here. Additionally, we’re preparing a preprint that we’ll be uploading to arXiv soon. Maybe we could use that opportunity to co-write something for the HF blog? We can discuss the angle but I believe that it could be really interesting!

isaac-chung commented 22 hours ago

Thanks, @dbuades! Those all sound good, and I'm happy to share/repost what you have. An HF blog would be good as well - happy to collaborate there!

dbuades commented 7 hours ago

Thanks, @dbuades! Those all sound good, and I'm happy to share/repost what you have. An HF blog would be good as well - happy to collaborate there!

Perfect! I'll keep you posted next week.