RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.
https://rlhflow.github.io/
Apache License 2.0
966 stars 71 forks source link

Regarding the Gemma2 Reward Model Structure #26

Open Loong435 opened 4 months ago

Loong435 commented 4 months ago

I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your GRM-Gemma-2B-Sftrug reward model and found that there were two linear values output in the end. During BT model training, I debugged and found that the final linear output of the reward model structure trained by your code was also 1. Also, during debugging, I found that the training script also separated 'chosen' and 'rejected' to obtain separate reward values for loss calculation. I would like to ask how your GRM-Gemma-2B-Sftrug reward model was trained, and after evaluation, I felt that these two linear values output a 'chosen' score and a 'rejected' score. It's a rejected score, could you explain it to me?

WeiXiongUST commented 4 months ago

@YangRui2015 could you look into this?

YangRui2015 commented 4 months ago

I tried to reproduce your gemma2B reward model training again and found that the reward model architecture fine tuned with internlm2 had an output header of 1. I downloaded your GRM-Gemma-2B-Sftrug reward model and found that there were two linear values output in the end. During BT model training, I debugged and found that the final linear output of the reward model structure trained by your code was also 1. Also, during debugging, I found that the training script also separated 'chosen' and 'rejected' to obtain separate reward values for loss calculation. I would like to ask how your GRM-Gemma-2B-Sftrug reward model was trained, and after evaluation, I felt that these two linear values output a 'chosen' score and a 'rejected' score. It's a rejected score, could you explain it to me?

Hi, the model Ray2333/GRM-Gemma-2B-sftreg outputs only one value and does not follow the original AutoModelForSequenceClassification class. It seems you may not have loaded it correctly. Please refer to the example here for the correct loading procedure.