Closed chuzhumin98 closed 2 years ago
Hi,
The passages we have released in our repository have been obtained by distilling the reader into the retriever, the method is described here: https://arxiv.org/pdf/2012.04584.pdf. The retriever can be downloaded from the repo.
Hi, authors. I'm now going to replicate your FiD project. I'm wondering about the data preprocessing strategies.
I found that the ''passages_index'' of Natural Questions and triviaqa datasets are just downloaded from the URL link ''https://dl.fbaipublicfiles.com/FiD/data/[dataset-name].tar.gz''. However, I could not find details about how to generate these passages_index files. Would the passages just be ranked based on the descending order of the Lucene-BM25 scores (excluding the passages that do not contain answers)? Or you adopted other methods to generate the passages_index?
Looking forward to your reply.