Closed lintool closed 3 years ago
no problem, will look into that after EMNLP.
Hyperparameter tuning results on msmarco using skopt (Gaussian process) for 50 iterations The original results using grid search can be found here: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md
Setting | MRR@10 | MAP | Recall@1000 |
---|---|---|---|
Default (k1=0.9,b=0.4) | 0.1840 | 0.1926 | 0.8526 |
Grid Search, Optimized for recall@1000 (k1=0.82, b=0.68) | 0.1874 | 0.1957 | 0.8573 |
Skopt, Optimized for recall@1000 (k1=0.75, b=0.87) | 0.1885 | 0.1966 | 0.8596 |
Grid Search, Optimized for MRR@10/MAP (k1=0.60, b=0.62) | 0.1892 | 0.1972 | 0.8555 |
Skopt, Optimized for MRR@10 (k1=0.61, b=0.78) | 0.1907 | 0.1987 | 0.8578 |
Skopt, Optimized for MAP (k1=0.60, b=0.81) | 0.1908 | 0.1989 | 0.8581 |
Nice, so there's still a bit more to be gained!
A few questions:
But then you're training and testing on the same dev queries?
You should probably use the queries here for a fair comparison w/ grid search? https://github.com/castorini/Anserini-data/tree/master/MSMARCO
I just followed the tune_bm25.py and I didn't change the code except for the grid search part.
As for the training queries do you mean by this one:
# Evaluate with official scoring script
results = subprocess.check_output(['python', 'tools/scripts/msmarco/msmarco_passage_eval.py',
'collections/msmarco-passage/qrels.train.tsv',
Here's the script I used:
python tools/scripts/msmarco/tune_bm25_skopt.py --base-directory runs_$metric \
--index indexes/msmarco-passage/lucene-index-msmarco \
--queries collections/msmarco-passage/queries.dev.small.tsv \
--qrels-tsv collections/msmarco-passage/qrels.dev.small.tsv \
--qrels-trec collections/msmarco-passage/qrels.dev.small.trec \
--skopt-iters $iters \
--hits $hits \
--metric $metric \
--seed $seed \
--threads 16
I see the mistakes. Will take care of this today.
New results using 5 training subsets for tuning k1 and b:
Setting | MRR@10 | MAP | Recall@1000 |
---|---|---|---|
Default (k1=0.9,b=0.4) | 0.1840 | 0.1926 | 0.8526 |
Grid Search, Optimized for recall@1000 (k1=0.82, b=0.68) | 0.1874 | 0.1957 | 0.8573 |
Skopt, Optimized for recall@1000 (k1=0.68, b=0.72) | 0.1890 | 0.1971 | 0.8575 |
Grid Search, Optimized for MRR@10/MAP (k1=0.60, b=0.62) | 0.1892 | 0.1972 | 0.8555 |
Skopt, Optimized for MAP (k1=0.63, b=0.62) | 0.1892 | 0.1972 | 0.8564 |
Closing issue. It seems like Skopt is overkill for tuning BM25, since grid search seems to suffice.
I do grid search for tuning BM25 here: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md#bm25-tuning
Which is kinda stupid.
We should use skopt: https://scikit-optimize.github.io/stable/
@alexlimh can you please contribute this after EMNLP?