castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.66k stars 371 forks source link

number of hits for a given query is not as specified in the retrieval command #1864

Closed NourOM02 closed 2 months ago

NourOM02 commented 5 months ago

I run the bm25 for the cqadupstack/english dataset using the following command :

command = python -m pyserini.search.lucene --threads 16 --batch-size 128 --index beir-v1.0.0-cqadupstack-english.flat --topics beir-v1.0.0-cqadupstack-english-test --output run.beir.bm25-flat.cqadupstack-english.txt --output-format trec --hits 1000 --bm25 --remove-query

It looks like for every query the results as expected, but the query with the id : 84177 only returns 1 hit (as you can see in the image)

image