Closed arielge closed 1 year ago
Thank you for your question. I don't think adding or removing that one affects model accuracy. If you want to build a new reward model you can refer to this repo: https://github.com/Dahoas/reward-modeling with the code much cleaner.
🐛 Describe the bug
Hi, There is something that is slightly unclear to me in the summarize_rlhf code - I see that the tokenizer used everywhere is the pretrained tokenizer of
EleutherAI/gpt-j-6B
, where the only modification made to it is defining the padding token as the EOS token. However, I see in the data processing function that a"<|startoftext|>"
string is added to the examples (e.g., https://github.com/CarperAI/trlx/blob/888ae11e59ae4f8e3232b0d1beb26567886ad72e/examples/summarize_rlhf/reward_model/train_reward_model_gptj.py#L38) although this is actually not a special token that is recognized by this specific tokenizer.I assume that the code could work fine as-is (even with the slightly strange behavior that the
"<|startoftext|>"
is split by the tokenizer into several regular tokens). Since currently my aim is to use the released reward model checkpoint, I mainly wanted to ask whether this is indeed the data processing behavior that was used when training the reward model. Thank you!Which trlX version are you using?
No response
Additional system and package information
No response