Regarding dataset IIRC - Githubissues

canghongjian / beam_retriever

[NAACL 2024] End-to-End Beam Retrieval for Multi-Hop Question Answering

https://arxiv.org/abs/2308.08973

Apache License 2.0

66 stars 8 forks source link

Regarding dataset IIRC #10

Closed yc-song closed 4 weeks ago

yc-song commented 1 month ago

Hi. Thanks for your interesting work.

I'm wondering if you can provide a minimal guideline for running your codebase on IIRC dataset (appendix D)

Also, I'm curious if the result from appendix D is applied to the official IIRC dataset or you went through another pre-processing.

Best regards.

canghongjian commented 1 month ago

Hello, thanks for your interest in our work! You should preprocess the original IIRC dataset using the method mentioned in Appendix D (more specifically, we follow the steps in IRCOT ), then you can train Beam Retrieval by editing the dataset_type in the bash file. The Retrieval EM is calculated in the same way as MuSiQue because the preprocessed IIRC shares the same format with MuSiQue.

yc-song commented 4 weeks ago

Everything clear. Thanks.