Open qwenzo opened 1 month ago
Usually, we use an EOS token to output the reward value. So we did not consider the situation of GPT-2
Hi, thank you for the reply. What is a possible way to fix this? Should I use a new special token?
Hi, thank you for the reply. What is a possible way to fix this? Should I use a new special token?
You can try to allocate an unused special token instead.
Hi,
first of all thank you very much for the repo! I would like to ask, is the EOS token in the reward model dataset necessary for the model? I'm using a gpt2 model where EOS is used for BOS and EOS is not usually used. That's why I was wondering if maybe this token is needed for reward modeling or is it model specific?
https://github.com/OpenLLMAI/OpenRLHF/blob/072e286a5c5f3cd6acf2c9ad7e4ef727a8dedb83/openrlhf/datasets/reward_dataset.py#L148
Thank you!