-
### 🚀 Feature
Hello,
in accordance with DLR-RM/stable-baselines3#1624, @SimRey and I would like to implement **Hybrid PPO** in this library.
[This](https://arxiv.org/pdf/1903.01344.pdf) is the pa…
-
### Feature request
Please provide example scripts in https://github.com/huggingface/trl/tree/main/examples/scripts/ppo for how to create corresponding SFT and RM checkpoints to use for PPO
### …
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
- `llamafactory` version: 0.8.3.dev0
- Platform: Linux-5.10.0-60.18.0.50.oe2203.aarch64-aarch64-with-gli…
-
Hello, when I run Formation task with `algo=mappo`, I got:
![mappo error](https://github.com/btx0424/OmniDrones/assets/55371740/1d49b582-6bdf-4fe6-91ae-3171c23397b6)
When I use `algo=ppo`, I got:
…
-
### What happened + What you expected to happen
The problem is about numpy version when I do checkpoints restore on model. On numpy>=2.0.0 works, but on numpy=1.20, but it works only >=2.0.0. I can n…
-
感谢作者的奉献,如 @zhuzilin 大佬添加的PRM模型训练的相关代码,我目前的疑问是,训练好的这个PRM模型可以直接用于step by step的RLHF训练吗?如果不能,请问PRM模型目前还有什么其他应用场景?因为看了几个issue都说目前不支持PPO训练,那么请问训练好的这个PRM模型可以用于其他训练模式吗,如DPO?
非常感谢!
#442
#498
#490
-
Hi! Could you tell me how to run evaluation for text control using the pretrained model?
Should we just change [this env](https://github.com/NVlabs/ProtoMotions/blob/main/data/pretrained_models/Maske…
-
Hey @vwxyzjn
it's been quite a few extremely busy months, but now, I finally have the capacity to contribute a single file implementation of PPO with Transformer-XL as episodic memory. The implement…
-
I have trained a model using LidarObservation as following:
```
model = PPO('MlpPolicy', env,
policy_kwargs=dict(net_arch=[256, 256]),
learning_rate=5e-4,
n_s…
-
Now that the pieces are in place for Agent V1, we have to finally implement how the agent will adapt to its environment. In #17 and #21, we have argued for a dual learning cycle that will allow us for…