OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.72k stars 160 forks source link

fix num_training_steps when micro rollout and train size are not equal #248

Closed wuxibin89 closed 3 months ago

wuxibin89 commented 3 months ago
len(prompts_dataloader) == prompts // micro_rollout_batch_size

When micro_rollout_batch_size and micro_train_batch_size are not equal, the num_update_steps_per_episodes is not correct and will cause cosine learning rate scheduler not work as expected.