castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.03k stars 457 forks source link

Some BEIR queries are unused #2575

Closed lintool closed 1 month ago

lintool commented 2 months ago

E.g.,

% gunzip -c topics.beir-v1.0.0-bioasq.test.bge-base-en-v1.5.jsonl.gz | wc
gunzip -c topics.beir-v1.0.0-bioasq.test.splade-pp-ed.tsv.gz | wc
gunzip -c topics.beir-v1.0.0-bioasq.test.splade_distil_cocodenser_medium.tsv.gz | wc
gunzip -c topics.beir-v1.0.0-bioasq.test.tsv.gz | wc
gunzip -c topics.beir-v1.0.0-bioasq.test.unicoil-noexp.tsv.gz | wc
gunzip -c topics.beir-v1.0.0-bioasq.test.wp.tsv.gz | wc

    3743 2885853 63739253
     500 1185261 7980762
     500 1050806 6911775
     500    4525   39190
     500  401383 2265529
     500    7037   44962

This means retrieval is wasting cycles generating results that won't be evaluated.

lintool commented 1 month ago

Closed by https://github.com/castorini/anserini-tools/pull/83