Question about your paper.

meaningful96 commented 1 month ago

Hello,

I have a couple of questions regarding your setup.

First, in your code (run_train_beam_retr.sh), the batch size during training for beam retrieval is set to 8, while in your paper, the batch size for training retrieval is 16. I’m wondering if the batch size significantly impacts performance. When training with a batch size of 8, the training time becomes quite long, so I am aiming to use the largest possible batch size. Does beam retrieval show significant performance changes depending on batch size?

Secondly, in the HotpotQA dataset, it is mentioned that models like FE2H or R^3 are used as readers, but for the supervised learning setting, DeBERTa-large is used. Does the "Supervised setting" refer specifically to the experiments involving Supporting sentences in the main results? Could you explain what the supervised setting entails and clarify what is meant by "Supporting sentences"? As far as i understand, "Answer" is used to evaluate the accuracy, while "Supporting Sentence" is used to assess whether the model has correctly extracted the information needed to derive the answer. I’m a bit confused about the distinction between the results for 'Answer' and 'Supporting'.

canghongjian commented 3 weeks ago

Hi @meaningful96 , sorry for the late reply.

The batch size does not influence the model performance.
"Supervised setting" is relative to "few-shot LLM", which refers to a small-size model like BERT using the label data for the supervised training.
As for the "supporting sentences", it is a sub-task in the HotpotQA dataset, which is used to assess whether the model has correctly extracted the information needed to derive the answer, as you said. "Answer" and "supporting sentences" are two essential aspects in the HotpotQA evaluation setting.

meaningful96 commented 3 weeks ago

Hi @canghongjian, thank you for replying. It was helpful. I finally understand what the 'supporting fact' is in all the datasets, but I’m still facing an issue of computational cost. In your code, the epochs for training are set to 12-16, and as I continue running the code, I’ve reached the 5th epoch.

I am currently training the HotpotQA dataset using the run_train_beam_retr.sh script, and it has already taken over 100 hours to complete 5 epochs. If I continue for 12 epochs, it is estimated to take more than 220 hours. Did it take you this long as well when you were training?

canghongjian commented 3 weeks ago

@meaningful96 As for the training speed, you can refer to https://github.com/canghongjian/beam_retriever/issues/2 .

canghongjian / beam_retriever

Question about your paper. #12