-
Hello! I was documenting your PPO code `algo/ppo.py` to improve my understanding of the algorithm, and I got confused on `max_grad_norm` and `_use_clipped_value_loss`.
If I am understanding this co…
-
Hi I have declared an agent like this
``` python
agent = PPOAgent(
states=dict(type='float', shape=(37,)),
actions=dict(type='int', num_actions=3),
network=[
…
-
Deep Q-learning方法中,说sample数据之间不是相互独立的,不符合监督学习假设,所以训练时引入经验回放等减弱数据之间关联;
那么基础policy gradient方法中,训练actor的数据是不是也存在数据不相互独立的问题?觉得在网上似乎没什么人在pg中讨论这个问题,不知道你怎么看?
如果存在,那么PPO的提出(除了可以控制两次更新差别不那么大,网络不崩),其off-pol…
-
-
## Description
We noticed our gluon/MXNet [Proximal Policy Optimization](https://arxiv.org/abs/1707.06347) (PPO) implementation is under-performing compared to the OpenAI Baselines version in TensorF…
-
`I have the following code
import gym
import numpy as np
from tensorforce.agents import PPOAgent
from tensorforce.agents import TRPOAgent
#from tensorforce import Configuration
NUM_GAMES_T…
-
Distributed Proximal Policy Optimization (DPPO) (Tensorflow)中提到的不让worker计算和更新梯度,而只是传数据(obversaion),让PPO飞起来。你的这个想法也许超前于DeepMind的IMPALA的并行智能体结构(http://i.dataguru.cn/mportal.php?aid=13103&mod=view)。
从…
-
https://blog.openai.com/openai-baselines-ppo/
OpenAI says PPO has become their default RL algorithm. Should we get a PPO implementation going in TensorGraph?
CC @peastman
-
https://blog.openai.com/openai-baselines-ppo/
OpenAI says PPO has become their default RL algorithm. Should we get a PPO implementation going in TensorGraph?
CC @peastman
-
I was implementing Proximal Policy Optimization when I noticed that my Pytorch version was outdated, so I updated. To my surprise, the code I was running which worked fine in 0.1.9 was completely brok…