CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.5k stars 473 forks source link

Special tokens in RLHF reward model code #292

Closed arielge closed 1 year ago

arielge commented 1 year ago

🐛 Describe the bug

Hi, There is something that is slightly unclear to me in the summarize_rlhf code - I see that the tokenizer used everywhere is the pretrained tokenizer of EleutherAI/gpt-j-6B, where the only modification made to it is defining the padding token as the EOS token. However, I see in the data processing function that a "<|startoftext|>" string is added to the examples (e.g., https://github.com/CarperAI/trlx/blob/888ae11e59ae4f8e3232b0d1beb26567886ad72e/examples/summarize_rlhf/reward_model/train_reward_model_gptj.py#L38) although this is actually not a special token that is recognized by this specific tokenizer.

I assume that the code could work fine as-is (even with the slightly strange behavior that the "<|startoftext|>" is split by the tokenizer into several regular tokens). Since currently my aim is to use the released reward model checkpoint, I mainly wanted to ask whether this is indeed the data processing behavior that was used when training the reward model. Thank you!

Which trlX version are you using?

No response

Additional system and package information

No response

PhungVanDuy commented 1 year ago

Thank you for your question. I don't think adding or removing that one affects model accuracy. If you want to build a new reward model you can refer to this repo: https://github.com/Dahoas/reward-modeling with the code much cleaner.