StonyBrookNLP / ircot

Repository for Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL23
https://arxiv.org/abs/2212.10509
Apache License 2.0
154 stars 20 forks source link

Contexts in processed_data #9

Closed bbuing9 closed 1 year ago

bbuing9 commented 1 year ago

Hi, first of all, thank you for the great work!

I really enjoyed reading the paper, and the proposed idea with promising results was really interesting.

Now, I am trying to use this codebase for my own project and have a question about the processed_data.

In the processed_jsonl file (e.g., test_subsampled.jsonl), the contexts are already included for all datasets.

Are these contexts the result of BM25 with one retrieval? If not, how they are obtained?

If you can provide the answer to this question, it would be really useful.

Thank you so much!

HarshTrivedi commented 1 year ago

Hi @bbuing9, The contexts in those files are NOT the result of BM25 retrieval. They are the contexts that come with the reading comprehension version of these datasets. Check out processing_scripts/process_{dataset_name}.py for how those preprocessed files are created. I am closing this for now, but please feel free to reopen if you have more questions.