facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Other
1.72k stars 303 forks source link

Data from retriever is not working with reader #229

Open UmerTariq1 opened 2 years ago

UmerTariq1 commented 2 years ago

Background: I am working on an idea which is an extension of DPR. so i used DPR retriever to get some retrieval results and I want to use them with DPR reader..

The output of retriever/input to the reader is a list and list's each item's structure is:

{
"question": <str>
"answers":  [ <str> ]
 "ctxs":  [  { "id": <str> ,  "title":<str>, "text":<str>, "score": <str>, "has_answer": true/false }  ]
}

But I am unable to use this file as the input to reader. The reason i think is because of the difference in structure to what DPR expects. I successfully ran DPR reader on nq-single dataset but their format is ReaderSample (question, answers, positive_passages, .....).

My question Is there any way I can convert the json file I have to the required format by the reader?

Relevant Issue I found issue #73 to be somewhat related but i think the code has been changed since then because I am unable to find the file preprocess_reader_data.py.

What I am expecting I am trying to use this data/json file (whose structure is mentioned above in the background section) for the input of DPR reader (train_extractive_reader.py)

tkabir1 commented 1 year ago

Have you been able to solve this issue? Any help will be appreciated.