BM25 of Dureader-retrieval

PaddlePaddle / RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

Apache License 2.0

767 stars 128 forks source link

BM25 of Dureader-retrieval #22

Closed tangzhy closed 2 years ago

tangzhy commented 2 years ago

Hi, I wonder what are the hyperparams of k1 and b set in your BM25 baseline?

HongyuLi2018 commented 2 years ago

Our implementation of BM25 is based on ElasticSearch, we simply use the default hyper-parameters (i.e. b=0.75, k1=1.2).

tangzhy commented 2 years ago

Thanks for reply. But out of a bit confusion, I use the default hyperparams which is exactly the same as yours, and then I found that my BM25 underperforms your published results on test cmedqa and covid.

My implementation is based on pyserini and simply choose language as chinese with 1-gram retrieval.

Maybe there exists some discrepancy between us, though it's not a big concern :)