Closed bbuing9 closed 1 year ago
Hi @bbuing9, The contexts in those files are NOT the result of BM25 retrieval. They are the contexts that come with the reading comprehension version of these datasets. Check out processing_scripts/process_{dataset_name}.py
for how those preprocessed files are created. I am closing this for now, but please feel free to reopen if you have more questions.
Hi, first of all, thank you for the great work!
I really enjoyed reading the paper, and the proposed idea with promising results was really interesting.
Now, I am trying to use this codebase for my own project and have a question about the processed_data.
In the processed_jsonl file (e.g., test_subsampled.jsonl), the contexts are already included for all datasets.
Are these contexts the result of BM25 with one retrieval? If not, how they are obtained?
If you can provide the answer to this question, it would be really useful.
Thank you so much!