canghongjian / beam_retriever

[NAACL 2024] End-to-End Beam Retrieval for Multi-Hop Question Answering
https://arxiv.org/abs/2308.08973
Apache License 2.0
81 stars 8 forks source link

Request for the preprocessed datasets #11

Closed meaningful96 closed 1 month ago

meaningful96 commented 1 month ago

Hi, thank you for sharing your great work!

Unfortunately, I'm facing an issue with preprocessing the data and training the model. Could you provide the preprocessed data(like google drive) and explain the training process in detail?

canghongjian commented 1 month ago

Hi, thanks for your interests! Just take HotpotQA as an example, you need to download the original data from the official website train_file and predict_file. Then you can modify the startup bash file run_train_beam_retr.sh by passing the actual path of train_file and predict_file. After that, you can start training by running the command sh run_train_beam_retr.sh.

meaningful96 commented 1 month ago

Thank you for your response. I was initially confused that i needed all the preprocessed datasets sampled by IRCoT, but now I understand and believe I can handle it on my own. I sincerely appreciate your help once again.

canghongjian commented 1 month ago

Feel free to contact me if you have any other questions!