DPR+BM25 - Githubissues

facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Other

1.72k stars 303 forks source link

DPR+BM25 #84

Closed alexlimh closed 3 years ago

alexlimh commented 3 years ago

Hi,

I'm wondering whether there're codes for DPR + BM25 as described in your paper:

"In addition to DPR, we also present the results of BM25, the traditional retrieval method9 and BM25+DPR, using a linear combination of their scores as the new ranking function. Specifically, we obtain two initial sets of top-2000 passages based on BM25 and DPR, respectively, and rerank the union of them using BM25(q,p) + λ · sim(q, p) as the ranking function. We used λ = 1.1 based on the retrieval accuracy in the development set."

Thanks, Minghan

vlad-karpukhin commented 3 years ago

Hi Minghan, we don't provide BM25 index and code - we used Lucene & Java and that would mean bringing Java & Lucene installation requirements to our project and thus raising the convenience-to-use bar for community. Anyway, hybrid approach as you can see from the final results is not generally better than the dense only scheme.

alexlimh commented 3 years ago

Hi Vladimir,

Thanks for your reply. I see your points but I still think it's necessary for reimplementation. Would it be acceptable if I use, for example, elastic search for BM25 and make it a pull request? It would be something like this: https://huggingface.co/docs/datasets/_modules/nlp/search.html

Best, Minghan

vlad-karpukhin commented 3 years ago

For the purpose of bringing BM25 implementation to the repo, I'd use Anserini framework instead of Elastic search. https://github.com/castorini/pyserini

alexlimh commented 3 years ago

Ah, you are right, pyserini is indeed better. Thanks for the information.

Minghan

vlad-karpukhin commented 3 years ago

I guess we can close this as an issue