-
In training the PPO of ColossalChat, two models actor and critic are needed. Can these two models be different? For example, the critic uses the bert model, and the actor uses the GPT model. In differ…
-
### 🐛 Bug
I've adapted the environment from this [blog post](https://medium.com/hackernoon/learning-policies-for-learning-policies-meta-reinforcement-learning-rl%C2%B2-in-tensorflow-b15b592a2ddf), …
-
### System Info
训练是否支持分布式以及更大模型比较qwen72b?
### Who can help?
@morning9393
### Information
- [X] The official example scripts
- [X] My own modified scripts
### Tasks
- [x] An offic…
-
Hi,
The current PPO implementation does not seem to account for time limits. While the `EpisodeWrapper` from brax is used, which tracks a truncation flag ([source](https://github.com/google/brax/bl…
-
hi, how am i supposed to save expert demo in ppo main?
-
Guys, Keras-rl is the best reinforcement learning library.
easy to handle despite complex rl algorithmic.
Keras-rl is far moore better than stable baseline.
please add ppo, a3c and other as dqn is …
-
In simply_PPO you multiple the action distribution's (Gaussian) mu by 2, why is that?
`mu = 2 * tf.layers.dense(l1, A_DIM, tf.nn.tanh, trainable=trainable)`
-
Hello, apologies if I do this wrong I don't contribute to open source often. I was attempting to run the Pytorch PPO implementation and kept getting several errors regarding the dimension of the obser…
-
Pak kijan minta si Asui dari Tempo pindah ke Cash, smangat kk.
-
## Problem Description
Would it be useful to add a complex (nested/dictionary) action and obs space variant of the PPO algo? I did this for `minerl` and wondered if it would be useful to contribute i…