beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

NQ - File datasets/nq/qrels/train.tsv not present #179

Open GiacoL opened 1 month ago

GiacoL commented 1 month ago

I downloaded the NQ dataset and the tsv file for the train set appears to be missing

2020uce0047 commented 1 month ago

Hi @GiacoL There's another zip file for train set - "nq-train" All the available datasets - https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/

Gerry-j commented 1 month ago

https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/,this url contains nq dataset,but I didn't find train.csv in nq.zip

Gerry-j commented 1 month ago

I downloaded the NQ dataset and the tsv file for the train set appears to be missing

please,where did you finally download train.csv?

2020uce0047 commented 1 month ago

Hi @Gerry-j https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nq-train.zip

Gerry-j commented 1 month ago

Thank you very much!

------------------ 原始邮件 ------------------ 发件人: "beir-cellar/beir" @.>; 发送时间: 2024年8月15日(星期四) 上午10:31 @.>; @.**@.>; 主题: Re: [beir-cellar/beir] NQ - File datasets/nq/qrels/train.tsv not present (Issue #179)

Hi @Gerry-j https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nq-train.zip

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

orionw commented 1 week ago

Hi all, thanks for this info. Is the corpus set not the same between them? I see 18,060,996 lines in the corpus for this link but the BEIR NQ corpus for test has 2,681,468? Perhaps the train has the unfiltered corpus while the test has the filtered version.

It seems like the qrels for train have documents up to 18 million also, so it appears one would have to index the train corpora separately to use these.