Closed kevinxyc1 closed 2 years ago
Hi @kevinxyc1 , Unfortunately, we don't provide the pre-processing pipeline in this repository for multiple reasons, sorry. In a nutshell, this is the use of WikiExtractor & some filtering code from DrQA repository. There are better preprocessing options provided in the pyserini toolkit (the link is in the readme).
I was wondering is there a way to replicate your Wikipedia preprocessing. Thanks!