beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

the number of queries in MSMARCO #95

Closed mjeensung closed 2 years ago

mjeensung commented 2 years ago

Hi,

I have a question regarding the number of queries in MSMARCO. According to the paper and the readme, the number of test queries in MSMARCO is 6,980.

However, when I ran the following codes, I was only able to get 43 queries.

>> corpus, queries, qrels = GenericDataLoader(data_folder='msmarco').load(split="test")
>> print(len(queries))
43

Instead, I got 6,980 queries from the dev set. Should I use the dev queries when evaluating MSMARCO instead of the test queries?

Thanks!

cadurosar commented 2 years ago

Hi,

The split you are looking for is the "dev" split (so split="dev"). BEIR considers MSMARCO test to be one of the TREC-DL competitions.

mjeensung commented 2 years ago

@cadurosar

Thanks! It's clear now.