castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.03k stars 457 forks source link

Instructions for reproducing runs on MS MARCO V2.1 with prebuilt indexes #2460

Closed lintool closed 5 months ago

lintool commented 6 months ago

Doc indexes:

java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics msmarco-v2-doc-dev -output runs/run.msmarco-v2.1-doc.dev.txt -hits 1000 -bm25
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics msmarco-v2-doc-dev2 -output runs/run.msmarco-v2.1-doc.dev2.txt -hits 1000 -bm25
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics trec2021-dl -output runs/run.msmarco-v2.1-doc.dl21.txt -hits 1000 -bm25
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics trec2022-dl -output runs/run.msmarco-v2.1-doc.dl22.txt -hits 1000 -bm25
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics trec2023-dl -output runs/run.msmarco-v2.1-doc.dl23.txt -hits 1000 -bm25

bin/trec_eval -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc.dev.txt
bin/trec_eval -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc.dev2.txt

bin/trec_eval -c -M 100 -m map tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl21.txt
bin/trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl21.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl21.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl21.txt

bin/trec_eval -c -M 100 -m map tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl22.txt
bin/trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl22.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl22.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl22.txt

bin/trec_eval -c -M 100 -m map tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl23.txt
bin/trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl23.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl23.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc.dl23.txt

recip_rank              all 0.1654
recip_rank              all 0.1732

map                     all 0.2281
recip_rank              all 0.8466
ndcg_cut_10             all 0.5183
recall_100              all 0.3502
recall_1000             all 0.6915

map                     all 0.0841
recip_rank              all 0.6623
ndcg_cut_10             all 0.2991
recall_100              all 0.1866
recall_1000             all 0.4254

map                     all 0.1089
recip_rank              all 0.5783
ndcg_cut_10             all 0.2914
recall_100              all 0.2604
recall_1000             all 0.5383

Segmented doc indexes:

java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc-dev -output runs/run.msmarco-v2.1-doc-segmented.dev.txt -bm25 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc-segmented -topics msmarco-v2-doc-dev2 -output runs/run.msmarco-v2.1-doc-segmented.dev2.txt -bm25 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc-segmented -topics trec2021-dl -output runs/run.msmarco-v2.1-doc-segmented.dl21.txt -bm25 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc-segmented -topics trec2022-dl -output runs/run.msmarco-v2.1-doc-segmented.dl22.txt -bm25 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
java -cp `ls target/*-fatjar.jar` io.anserini.search.SearchCollection -index msmarco-v2.1-doc-segmented -topics trec2023-dl -output runs/run.msmarco-v2.1-doc-segmented.dl23.txt -bm25 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000

bin/trec_eval -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev.txt runs/run.msmarco-v2.1-doc-segmented.dev.txt
bin/trec_eval -c -M 100 -m recip_rank tools/topics-and-qrels/qrels.msmarco-v2.1-doc.dev2.txt runs/run.msmarco-v2.1-doc-segmented.dev2.txt

bin/trec_eval -c -M 100 -m map tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl21.txt
bin/trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl21.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl21.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.dl21-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl21.txt

bin/trec_eval -c -M 100 -m map tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl22.txt
bin/trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl22.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl22.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.dl22-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl22.txt

bin/trec_eval -c -M 100 -m map tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl23.txt
bin/trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl23.txt
bin/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl23.txt
bin/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.dl23-doc-msmarco-v2.1.txt runs/run.msmarco-v2.1-doc-segmented.dl23.txt

recip_rank              all 0.1973
recip_rank              all 0.2000

map                     all 0.2609
recip_rank              all 0.9026
ndcg_cut_10             all 0.5778
recall_100              all 0.3811
recall_1000             all 0.7115

map                     all 0.1079
recip_rank              all 0.7213
ndcg_cut_10             all 0.3576
recall_100              all 0.2330
recall_1000             all 0.4790

map                     all 0.1391
recip_rank              all 0.6519
ndcg_cut_10             all 0.3356
recall_100              all 0.3049
recall_1000             all 0.5852
lintool commented 5 months ago

Superseded by https://github.com/castorini/anserini/blob/master/docs/fatjar-regressions-v0.35.1.md