-
There seem to be some issues with getting the environment set up. I tried two installation methods.
### Installation 1
One is to do
```
pip install -r requirements/requirements-envpool.txt…
-
`python main.py --task "fullplace" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 1 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.…
-
Given:
ppo_model = AutoModelForCausalLMWithValueHead.from_pretrained('path/to/my/AutoModelForCausalLM', torch_dtype="auto")
ppo_model.save_pretrained("ppo_model")
When I do:
ppo_model = AutoMode…
-
### When I run the following script
```
import torch
from accelerate import Accelerator, PartialState
from peft import LoraConfig
from tqdm import tqdm
from transformers import AutoTokenizer, …
-
Hi, an awesome work!
I am interested in how we can train a skilled policy by PPO. Would you be able to provide a training code? It would be really helpful for me. Thank you!
-
Hello Patrick,
I am doing an implementation of the PPO algorithm for a custom environment and first wanted to test things out with a standard example and I choose CartPole-v1 implemented with [`gym…
-
When running the PPO baseline on my M1 Mac using the command `python ppo.py --save_policy`, I encounter `ValueError: Unrecognized name format` during the policy-saving process within the _save_network…
-
I noticed that RemoteExperienceMaker left-pads the input sequences even when using vllm for generation:
https://github.com/OpenLLMAI/OpenRLHF/blob/dcd379a44eea56625626d1a0832cd3eeda048b21/openrlhf/tr…
-
**Describe the bug**
We followed [Accelerated-RLHF.md](https://github.com/NVIDIA/NeMo-Aligner/blob/v0.3.0.trtllm/Accelerated-RLHF.md) to run the accelerate the PPO training by using TensorRT-LLM. A…
-
Where is the code for robot kinematics and dynamics? Additionally, I would like to migrate our project, PPO, to ROS2 Gazebo for simulation. Do you have any suggestions or references?