OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.73k stars 164 forks source link

clarification on config std and mean calculation #224

Closed karthik-nexusflow closed 4 months ago

karthik-nexusflow commented 4 months ago

Hi Team, we would like to drop in our reward model and would like to how the following values were calculated and added to config ? was this calculated by running reward inference on entire train reward dataset ?

if hasattr(config, "mean"):
                self.mean[0] = config.mean
                self.std[0] = config.std
hijkzzz commented 4 months ago

The mean and std are calculated using the testing data of RM (during RM training stage)

wuxibin89 commented 4 months ago

https://github.com/OpenLLMAI/OpenRLHF/blob/main/openrlhf/trainer/rm_trainer.py#L219-L226

karthik-nexusflow commented 4 months ago

Great thanks