Closed thibault-formal closed 3 years ago
Hi @thibault-formal, It definitely wasn't intended. Unfortunately, there are a high number of interesting datasets out there and we went ahead and chose a few of them. I also find standard ad-hoc collections interesting for analysis, I am currently in process of collecting the Robust04 dataset (sadly, requires a week to get the collections) and hopefully would have the analysis soon for the dataset.
Kind Regards, Nandan
Hi @NThakur20 great, I think a lot of people will be interested in this -- including myself. :) cheers, Thibault
Hi @thibault-formal, apologies for the delay. But the NIST website was down and finally got hold of the Robust04 dataset. Here below I mention the nDCG@10 scores over the dataset -
Test Queries: 249 Number of Documents: 528155
BM25 | DPR (Multi) | SBERT (MiniLM-L6) | SBERT (DistilBERT-v3) | ANCE | ColBERT (100) | BM25 (100) + ELECTRA | BM25 (100) + MiniLM-L6 |
---|---|---|---|---|---|---|---|
0.387 | 0.252 | 0.293 | 0.318 | 0.392 | 0.391 | 0.438 | 0.467 |
hi @NThakur20, great thank you !
Hi, Very useful work, thanks ! I was wondering why you did not include standard ad-hoc retrieval collections in the benchmark (like Robust04) ? Is it intended ? For people working on neural IR, it would be interesting to see how models trained on MS MARCO systematically generalize to these collections too