beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 187 forks source link

Adding TREC ad-hoc collections #3

Closed thibault-formal closed 3 years ago

thibault-formal commented 3 years ago

Hi, Very useful work, thanks ! I was wondering why you did not include standard ad-hoc retrieval collections in the benchmark (like Robust04) ? Is it intended ? For people working on neural IR, it would be interesting to see how models trained on MS MARCO systematically generalize to these collections too

thakur-nandan commented 3 years ago

Hi @thibault-formal, It definitely wasn't intended. Unfortunately, there are a high number of interesting datasets out there and we went ahead and chose a few of them. I also find standard ad-hoc collections interesting for analysis, I am currently in process of collecting the Robust04 dataset (sadly, requires a week to get the collections) and hopefully would have the analysis soon for the dataset.

Kind Regards, Nandan

thibault-formal commented 3 years ago

Hi @NThakur20 great, I think a lot of people will be interested in this -- including myself. :) cheers, Thibault

thakur-nandan commented 3 years ago

Hi @thibault-formal, apologies for the delay. But the NIST website was down and finally got hold of the Robust04 dataset. Here below I mention the nDCG@10 scores over the dataset -

TREC-Robust04

Test Queries: 249 Number of Documents: 528155

BM25 DPR (Multi) SBERT (MiniLM-L6) SBERT (DistilBERT-v3) ANCE ColBERT (100) BM25 (100) + ELECTRA BM25 (100) + MiniLM-L6
0.387 0.252 0.293 0.318 0.392 0.391 0.438 0.467
thibault-formal commented 3 years ago

hi @NThakur20, great thank you !