beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

Providing BM25 top-100 for all datasets #56

Closed davidmrau closed 2 years ago

davidmrau commented 2 years ago

First of all, thanks for this amazing benchmark. I'd like to evaluate a re-ranking model on several datasets. If I got I correctly, I will have to download and index all datasets independently to get the top-100 BM25 rank lists. Could you please provide those for each dataset for easier evaluation?

Thanks a lot!

memray commented 1 year ago

I would echo this request since I cannot reproduce the BM25 (Elasticsearch, default setting) results on certain datasets, particularly the ones with fewer docs like NFCorpus and SciFact.

Besides, I found that the scores may change a bit in different runs. Is there any randomness inside Elasticsearch? Maybe @thakur-nandan can provide some insights? Thanks!