-
Could you provide the PPO codebase that can reproduce the results of the paper? I have not found it in this repo. Thank you!
-
# Reference
- 07/2017 [Proximal policy optimization algorithms](https://arxiv.org/abs/1707.06347)
# Brief
- 基于策略梯度(PG,Policy Gradient)
-
- https://openai.com/blog/openai-baselines-ppo/
- https://medium.com/intro-to-artificial-intelligence/proximal-policy-optimization-ppo-a-policy-based-reinforcement-learning-algorithm-3cf126a7562d
- …
-
Feature request to add PPO
-
-
there are 'mean reward' and 'std of reward' in PPO
but 'mean reward' and 'mean group reward' in POCA
I didn`t find the way to get 'std of reward' in POCA
I`ll appreciate if it can be get in YAML bu…
-
I wonder if lstm+ppo/sac could use in Tianshou? Since there are some problems.
-
### RNN+PPO
**when I replace the `ActorProb` to `RecurrentActorProb` and `Critic` to `RecurrentCritic` in `test/continuous/test_ppo.py` , the bug is below:**
``` File "E:\ANCONDA\lib\site-pac…
-
Hello, I have fine tuned a `[Code-T5](https://huggingface.co/Salesforce/codet5-small)` model on my custom dataset. Now while I was using `trl` to further train the fine tuned model to align it better,…
-
There are several optimizations to our PPO recipe which could help push it closer to SOTA in terms of performance. There are also several pieces of documentation we could offer alongside this recipe t…