-
train.py 和config_ppo.yaml 中low_level_load_path是如何生成的
evaluate.py中设置了lower_model和upper_model
报错Encoder type cnn not supported!
4个upper_model都试过了加载后的encoder_type是cnn而非pixel
有更详细的训练或者验证介绍嘛
-
1) counter
2) for index in BatchSampler(SubsetRandomSampler(range(self.buffer_capacity), self.batch_size, True)):
-
PPO + LSTM have a extral hyperparameter what is bptt horizon. Is possible I set up it?
-
Hello,
Was wondering if the model weights made available at https://notanymike.github.io/rl/2017/12/18/Solving-CarRacing.html were produced using the PPO hyperparameters from the original Schulman …
-
Could you provide the PPO codebase that can reproduce the results of the paper? I have not found it in this repo. Thank you!
-
Release test **rllib_learning_tests_pong_ppo_torch.aws** failed. See https://buildkite.com/ray-project/release/builds/16725#018fe1f2-a6ac-4002-b08b-6d5c34f87e40 for more details.
Managed by OSS Test …
-
# Reference
- 07/2017 [Proximal policy optimization algorithms](https://arxiv.org/abs/1707.06347)
# Brief
- 基于策略梯度(PG,Policy Gradient)
-
- https://openai.com/blog/openai-baselines-ppo/
- https://medium.com/intro-to-artificial-intelligence/proximal-policy-optimization-ppo-a-policy-based-reinforcement-learning-algorithm-3cf126a7562d
- …
-
-
I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward * 0.9 + score * 0.1'