facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Other
1.71k stars 301 forks source link

How to replicate the wiki preprocessing #182

Closed kevinxyc1 closed 2 years ago

kevinxyc1 commented 3 years ago

I was wondering is there a way to replicate your Wikipedia preprocessing. Thanks!

vlad-karpukhin commented 3 years ago

Hi @kevinxyc1 , Unfortunately, we don't provide the pre-processing pipeline in this repository for multiple reasons, sorry. In a nutshell, this is the use of WikiExtractor & some filtering code from DrQA repository. There are better preprocessing options provided in the pyserini toolkit (the link is in the readme).