beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

the number of queries is not as reported for some datasets? #111

Closed lmh0921 closed 1 year ago

lmh0921 commented 1 year ago

Hello,

In some datasets, for example FIQA, if I download the datasets from https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/fiqa.zip the queries.jsonl contains 6648 queries, while in the paper or the readme of github, it is reported 648, could you tell me why they are different?which one is correct

Thanks

thakur-nandan commented 1 year ago

Hi @lmh0921,

The queries.jsonl file contains all train, development and test queries together in the single file. Refer to the qrels/test.tsv for the 648 queries (unique query ids) present in the test split.

Kind Regards, Nandan Thakur

lmh0921 commented 1 year ago

ok, got the point, thanks @thakur-nandan !