OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.73k stars 164 forks source link

adding length penalty to reward #236

Open karthik-nexusflow opened 4 months ago

karthik-nexusflow commented 4 months ago

Hi Team, While using the PPO pipeline we observe at times spikes in response length and were curious if any techniques related to length penalty is available or explored

hijkzzz commented 4 months ago

try to increase the kl penalty