A question about ``passages_index''

Hi, authors. I'm now going to replicate your FiD project. I'm wondering about the data preprocessing strategies.

I found that the ''passages_index'' of Natural Questions and triviaqa datasets are just downloaded from the URL link ''https://dl.fbaipublicfiles.com/FiD/data/[dataset-name].tar.gz''. However, I could not find details about how to generate these passages_index files. Would the passages just be ranked based on the descending order of the Lucene-BM25 scores (excluding the passages that do not contain answers)? Or you adopted other methods to generate the passages_index?

Looking forward to your reply.

facebookresearch / FiD

A question about ``passages_index'' #10