castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.64k stars 364 forks source link

Better BM25 tuning with skopt #564

Closed lintool closed 3 years ago

lintool commented 3 years ago

I do grid search for tuning BM25 here: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md#bm25-tuning

Which is kinda stupid.

We should use skopt: https://scikit-optimize.github.io/stable/

@alexlimh can you please contribute this after EMNLP?

alexlimh commented 3 years ago

no problem, will look into that after EMNLP.

alexlimh commented 3 years ago

Hyperparameter tuning results on msmarco using skopt (Gaussian process) for 50 iterations The original results using grid search can be found here: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md

Setting MRR@10 MAP Recall@1000
Default (k1=0.9,b=0.4) 0.1840 0.1926 0.8526
Grid Search, Optimized for recall@1000 (k1=0.82, b=0.68) 0.1874 0.1957 0.8573
Skopt, Optimized for recall@1000 (k1=0.75, b=0.87) 0.1885 0.1966 0.8596
Grid Search, Optimized for MRR@10/MAP (k1=0.60, b=0.62) 0.1892 0.1972 0.8555
Skopt, Optimized for MRR@10 (k1=0.61, b=0.78) 0.1907 0.1987 0.8578
Skopt, Optimized for MAP (k1=0.60, b=0.81) 0.1908 0.1989 0.8581
lintool commented 3 years ago

Nice, so there's still a bit more to be gained!

A few questions:

alexlimh commented 3 years ago
lintool commented 3 years ago

But then you're training and testing on the same dev queries?

You should probably use the queries here for a fair comparison w/ grid search? https://github.com/castorini/Anserini-data/tree/master/MSMARCO

alexlimh commented 3 years ago

I just followed the tune_bm25.py and I didn't change the code except for the grid search part.

As for the training queries do you mean by this one:

# Evaluate with official scoring script
results = subprocess.check_output(['python', 'tools/scripts/msmarco/msmarco_passage_eval.py',
                                'collections/msmarco-passage/qrels.train.tsv',

Here's the script I used:

python tools/scripts/msmarco/tune_bm25_skopt.py --base-directory runs_$metric \
        --index indexes/msmarco-passage/lucene-index-msmarco \
        --queries collections/msmarco-passage/queries.dev.small.tsv \
        --qrels-tsv collections/msmarco-passage/qrels.dev.small.tsv \
        --qrels-trec collections/msmarco-passage/qrels.dev.small.trec \
        --skopt-iters $iters \
        --hits $hits \
        --metric $metric \
        --seed $seed \
        --threads 16
alexlimh commented 3 years ago

I see the mistakes. Will take care of this today.

alexlimh commented 3 years ago

New results using 5 training subsets for tuning k1 and b:

Setting MRR@10 MAP Recall@1000
Default (k1=0.9,b=0.4) 0.1840 0.1926 0.8526
Grid Search, Optimized for recall@1000 (k1=0.82, b=0.68) 0.1874 0.1957 0.8573
Skopt, Optimized for recall@1000 (k1=0.68, b=0.72) 0.1890 0.1971 0.8575
Grid Search, Optimized for MRR@10/MAP (k1=0.60, b=0.62) 0.1892 0.1972 0.8555
Skopt, Optimized for MAP (k1=0.63, b=0.62) 0.1892 0.1972 0.8564
lintool commented 3 years ago

Closing issue. It seems like Skopt is overkill for tuning BM25, since grid search seems to suffice.