-
currently, I use `ppo_trainer.save_pretrained` to save a model that is still in training, because the machine I used is rather unstable, and I would often need to resume retraining should it be interr…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
[2024-06-07 10:17:14,980] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator t…
-
### What happened + What you expected to happen
I can’t seem to replicate the original [PPO](https://arxiv.org/pdf/1707.06347) algorithm's performance when using RLlib's PPO implementation. The hyp…
-
Hi, an awesome work!
I am interested in how we can train a skilled policy by PPO. Would you be able to provide a training code? It would be really helpful for me. Thank you!
-
I see your codebase has some functions not mentioned in your paper, such as supporting Lean4 or DPO and PPO, do you have docs for Lean4 and all the scripts in the root file?
-
I noticed that in the PPO agent initialization it forces the `is_action_continuous=False` whereas the PPO algorithm and other libraries implementing PPO allow continuous actions. Can this be added to …
-
When performing PPO step, the code perform the forward pass in [line 798](https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py) using the function "batched_forward_pass".
However, …
-
Hi @eleurent . thank you so much for the contribution. Please I need to know how you figured out the hyperparameters of DQN in the highway run env. did you use optuna for optimizing the hyperparameter…
-
Does Optimum Neuron have support for [TRL](https://huggingface.co/docs/trl/index) supervised fine-tuning, reward modelling, and PPO using Trainium? Is TRL the best path to support RLHF?
-
# Implementing Proximal Policy Optimisation
I've used some of the [PyTorch RFC](https://github.com/pytorch/rfcs/blob/master/README.md) template here for clarity.
**Authors:**
* @salmanmohammadi…