Hi,
I really appreciate your great inspiring work and open-sourcing your code. Your detailed explanation in the paper and complete code really helps me a lot.
And I have been conducting experiments using your code but encountered a issue: a substantial portion of the input in the dataset (consisting of appended questions and passages) exceeds the tokenizer's length limit. This results in the text being truncated, which in turn leads to suboptimal or even poor responses generated by the T5 model. I am curious whether using T5 (and truncate the input) is the appropriate choice for long-form QA tasks, or if this approach is fundamentally sound. I understand that your primary focus has been on using reward models for RLHF, but I wonder about the effectiveness of a model trained on truncated texts. For instance, 3 different reward models and a pretrained T5-1k, when trained under these conditions, might not perform optimally. Could the integrity of the outcomes be compromised by training with truncated inputs?
Hi, I really appreciate your great inspiring work and open-sourcing your code. Your detailed explanation in the paper and complete code really helps me a lot.
And I have been conducting experiments using your code but encountered a issue: a substantial portion of the input in the dataset (consisting of appended questions and passages) exceeds the tokenizer's length limit. This results in the text being truncated, which in turn leads to suboptimal or even poor responses generated by the T5 model. I am curious whether using T5 (and truncate the input) is the appropriate choice for long-form QA tasks, or if this approach is fundamentally sound. I understand that your primary focus has been on using reward models for RLHF, but I wonder about the effectiveness of a model trained on truncated texts. For instance, 3 different reward models and a pretrained T5-1k, when trained under these conditions, might not perform optimally. Could the integrity of the outcomes be compromised by training with truncated inputs?