Closed hank0316 closed 7 months ago
Thanks for raising this. Let's take a look. Maybe @ValentinaPy (mentioned some interested in continuing on this project)
Also @hank0316 -- if you want to open a PR with that solution we can test it further!
@natolambert , I have opened the PR. Would you kindly review it? Additionally, I apologize for inadvertently pressing the close button; I am not familiar with adding comments to an issue.
Hey @natolambert, I have a question about training a reward model. Do you think it's necessary to incorporate chat templates during data preprocessing for RM training? And if yes, should the template align with those used in SFT?
Yes @hank0316 chat templates are important. I think there can be slightly differences (e.g. for RM you aren't generating after iirc), but it should match at a high level.
Yes @hank0316 chat templates are important. I think there can be slightly differences (e.g. for RM you aren't generating after iirc), but it should match at a high level.
Sure, @natolambert! I appreciate your response and this fantastic benchmark. I'm also curious if any models on the leaderboard employ the tulu
chat template for evaluation but utilize their own chat template in SFT. In other words, the template used by that model in SFT differs from the one utilized in this benchmark.
@hank0316 nope, not to my knowledge. Most use tokenizers implementation.
Hi,
I encountered some problem about
pad_token_id
.I trained a TinyLlama reward model by modifying TRL sample code. I want to use this benchmark for evaluation. I add the following code into
REWARD_MODEL_CONFIG
:And I run the evaluation by
python scripts/run_rm.py --model=TinyLlama/TinyLlama-1.1B-Chat-v0.5 --chat_template=TinyLlama --do_not_save
.The error message looks like this:
I think this error message means that there's no
pad_token_id
in the model config. Thepad_token_id
exists in the tokenizer. As a result, I add two line of code underline 185
ofscripts/run_rm.py
:I would appreciate it if someone could review this modification to confirm that it correctly addresses the issue. Additionally, if there are any concerns or alternative approaches to consider, please let me know.
Thank you for your attention to this matter.
Best regards, Hank