OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.72k stars 160 forks source link

RM training loss becomes NAN when finish the first training step. #288

Open lixsh6 opened 1 month ago

lixsh6 commented 1 month ago

I used a large model (> 170B) as my reward model. In the very beginning, loss is normal. But when training one step, the loss becomes NAN. This situation didn't happen when I used a smaller base model (e.g., 30B) to train RM. Do you have any suggestions about this?

hijkzzz commented 1 month ago

We don't have such a big model, maybe it is related to DeepSpeed?