OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.72k stars 161 forks source link

Is save checkpoint not yet supported for ppo ray trainer? #256

Open mickel-liu opened 3 months ago

mickel-liu commented 3 months ago

When I set save_step other than -1, the program outputs an exception

self.actor.model, os.path.join(args.ckpt_path, "_actor"), tag, args.max_ckpt_num, args.max_ckpt_mem
AttributeError: 'Namespace' object has no attribute 'ckpt_path'

https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_trainer.py#L378-L385

These three args are indeed not included in train_ppo_ray.py and I don't see arg.save_path being used.

I did see this issue was mentioned in #133, wondering if there's any update.

hijkzzz commented 3 months ago

Yes, we haven't fully developed and tested this feature yet. Welcome contribution

mickel-liu commented 3 months ago

i'm happy to look into it, but how have you guys been saving models?

suehyunpark commented 1 month ago

Hi @mickel-liu, have you figured this out? I have no choice but to use train_ppo_ray.py for PPO instead of train_ppo.py, because it doesn't OOM during model loading in my configuration. I am looking into ways to save checkpoints during/after training, and was hoping if you have delved into this feature as well.

mickelliu commented 1 month ago

Hi @mickel-liu, have you figured this out? I have no choice but to use train_ppo_ray.py for PPO instead of train_ppo.py, because it doesn't OOM during model loading in my configuration. I am looking into ways to save checkpoints during/after training, and was hoping if you have delved into this feature as well.

Hi, I did look into the code and found out the saving checkpoints feature is not yet implemented. But actually saving checkpoints wasn't what I was looking for, I want the actual model checkpoints, not the intermediate states as being referred in this repo. So I ended up changing the code on my fork and now it saves model checkpoints after a pre-set amount of iterations. Here's the code in my fork: https://github.com/mickelliu/OpenRLHF/blob/a7f21aa26ac027fcf30ca1c588e01cf07c67cb6f/openrlhf/trainer/ppo_trainer.py#L428-L442

Regardless of ckpt feature is being officially implemented, train_ppo_ray.py will save a model checkpoint at the end of the training.

suehyunpark commented 1 month ago

Hi @mickel-liu, have you figured this out? I have no choice but to use train_ppo_ray.py for PPO instead of train_ppo.py, because it doesn't OOM during model loading in my configuration. I am looking into ways to save checkpoints during/after training, and was hoping if you have delved into this feature as well.

Hi, I did look into the code and found out the saving checkpoints feature is not yet implemented. But actually saving checkpoints wasn't what I was looking for, I want the actual model checkpoints, not the intermediate states as being referred in this repo. So I ended up changing the code on my fork and now it saves model checkpoints after a pre-set amount of iterations. Here's the code in my fork: https://github.com/mickelliu/OpenRLHF/blob/a7f21aa26ac027fcf30ca1c588e01cf07c67cb6f/openrlhf/trainer/ppo_trainer.py#L428-L442

Regardless of ckpt feature is being officially implemented, train_ppo_ray.py will save a model checkpoint at the end of the training.

Thanks for the quick reply and for sharing your code! I'm glad to know that saving the trained model would be that simple. Although the checkpointing feature would be a great add, this fix seems to solve my issue.