Closed thigm85 closed 2 years ago
Hi @thigm85,
I used this python script here: evaluate_bm25.py to generate scores for Elasticsearch BM25 in the leaderboard.
However as notified recently in #58, some metrics might be different from what I got in the leaderboard. I can rerun them soon and update the latest metrics for the leaderboard. However, since these changes have not been currently reflected in the latest pip version, you can download the development branch locally and use evaluate_bm25.py to get the accurate Elasticsearch BM25 scores.
Kind Regards, Nandan Thakur
Thanks for the reply
Hi @thigm85,
I used this python script here: evaluate_bm25.py to generate scores for Elasticsearch BM25 in the leaderboard.
However as notified recently in #58, some metrics might be different from what I got in the leaderboard. I can rerun them soon and update the latest metrics for the leaderboard. However, since these changes have not been currently reflected in the latest pip version, you can download the development branch locally and use evaluate_bm25.py to get the accurate Elasticsearch BM25 scores.
Kind Regards, Nandan Thakur
Hi @NThakur20,
I want to confirm if you have updated the metrics. I fail to reproduce them with the latest code.
Can I find the complete script used to generate this leaderboard somewhere? I saw snippets such as benchmark_bm25.py but not a full-scale script that includes elastic search config and all.
I am implementing a BEIR compatible Vespa version that I plan to submit as a PR soon. I am, however, finding different results between my BM25 metrics and the elastic BM25 results from the leaderboard.
Generating results side by side would be great to debug my implementation.