beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.49k stars 177 forks source link

evaluate_anserini_bm25.py retrieves 1000 documents for each query no matter which k I set in payload #146

Open zhiyuanpeng opened 1 year ago

zhiyuanpeng commented 1 year ago

Hi there,

I set k=10 in payload = {"queries": query_texts, "qids": qids, "k": 10 but the bm25 still retrieves 1000 documents for each query. I re-filter the 1000 documents to work around it. It will be great if you can fix this bug. Thanks.

thakur-nandan commented 1 year ago

Hi @zhiyuanpeng,

Thanks for notifying this bug. I'll add it in my todo list to update it.

The evaluate_anserini_bm25.py actually uses an old docker version of Anserini BM25. If you wish to run the latest Anserini BM25, I would redirect you here: https://github.com/castorini/anserini/tree/master#%EF%B8%8F-regression-experiments--reproduction-guides.

The Anserini repository lets you change more parameters and ensures reproducible BM25 regressions on BEIR.

Hope it helps!

lintool commented 1 year ago

@thakur-nandan File an issue to redirect the BM25 baselines over to Pyserini? Will save you from having to answer such queries again in the future...