Closed zhiyuanpeng closed 1 year ago
I re-run the notebook and get a very closed NDCG@10 score: 0.69064 on scifact. The two NDCG@10 scores I get are much bigger than 0.665 reported in table 2
For msmarco you have to run it on the dev split
@nreimers
Thanks for your reply. On msmarco dev, I got a very closed NDGC@10 score: 0.22747
. I am confused about reporting the results on dev dataset instead of the test dataet. The training script of sbert evaluates the model on dev dataset during the training. Why report the results on dev? BTW, could you make it clear that how to use your splits. like which split is for training, which split is for evaluation during the training and which file is for final testing. Thank your very much!
Msmarco doesn't have a test set. here in BEIR the test split is TREC DL 2019. It is quite confusing.
People report for msmarco on dev set. This dev set shouldn't be used for training/stopping etc
@nreimers Thank you for the clarification! BTW, I can's reproduce your BM25 NDCG@10 on FEVER. I run your notebook and get NDCG@10 0.64938 which is much smaller than 0.753 reported in table 2. On dev, the NDCG@10 is 0.66363 which is also much smaller than 0.753.
update: BEIR reports Anserini BM25. I will run it. Thanks.
To reproduce official BEIR scores, Pyserini is probably easier... https://github.com/castorini/pyserini/
Specifically, try: https://castorini.github.io/pyserini/2cr/beir.html
You'll be able to get the scores on the official BEIR leaderboard: https://eval.ai/web/challenges/challenge-page/1897/overview
cc/ @thakur-nandan
@lintool Thanks. My reproduced NDCG@10 on FEVER is much closer to your result 0.6513. BEIR utilizes Anserini BM25. I will run it to reproduce the results.
Hi
I instal elacticsearch on debian 10:
I run the BM25 on BEIR MS-MARCO dataset, and I obtain NDCG@10: 0.4769 which is much bigger than the score 0.228 in table 2 of your paper. On scifact, I get NDCG@10 0.6906 which is also much higher than 0.665 in table 2. Any suggestions? Thanks