facebookresearch / FiD

Fusion-in-Decoder
Other
536 stars 107 forks source link

A question about ``passages_index'' #10

Closed chuzhumin98 closed 2 years ago

chuzhumin98 commented 2 years ago

Hi, authors. I'm now going to replicate your FiD project. I'm wondering about the data preprocessing strategies.

I found that the ''passages_index'' of Natural Questions and triviaqa datasets are just downloaded from the URL link ''https://dl.fbaipublicfiles.com/FiD/data/[dataset-name].tar.gz''. However, I could not find details about how to generate these passages_index files. Would the passages just be ranked based on the descending order of the Lucene-BM25 scores (excluding the passages that do not contain answers)? Or you adopted other methods to generate the passages_index?

Looking forward to your reply.

gizacard commented 2 years ago

Hi,

The passages we have released in our repository have been obtained by distilling the reader into the retriever, the method is described here: https://arxiv.org/pdf/2012.04584.pdf. The retriever can be downloaded from the repo.