-
Hello, i was trying to train a Hopper Gym agent with the rllab++ version of DDPG found in
`sandbox/rocky/tf/algos/ddpg.py`.
I initially run the experiment as suggested by @shaneshixiang with the …
-
I am working on reinforcement learning task and it requires calculating prediction too many times. I have found that 56,87% of cumulative time is taken by **_predict_loop** method. Also I have found t…
-
Operating system: Ubuntu 16.04 x64
numpy version: '1.13.1'
python version: Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linu…
-
Hi,
Thanks for the great implementation, I am currently learning RL and I am trying to adapt paac for a simple use case of CartPole. I made modifications to paac code to include a new environment…
-
http://arxiv.org/pdf/1602.01783v1.pdf describes asynchronous methods using off policy (1 step /n step Q learning) and even on policy (sarsa and advantage actor-critic (A3C)) reinforcement learning.
T…
-
Dear Danny
Thank you for the great work! I have two questions:
**1- Is it possible to change the “CliffWalk Actor Critic Solution.ipynb” code to implement Actor-Critic for Gym Arari games?**
I b…
-
Hi Philip, I was wondering whether it's possible to manually set the emulator speed. It'd be nice to further increase the speed, say to 5000%, during training. Additionally, when demoing the RL agent,…
-
Setting up openai/universe, I used the "universe starter agent" as a smoke test.
After adjusting the number of workers to better utilize my CPU, I saw the default PongDeterministic-v3 start winnin…
-
Hi,
A non-technical question, I hope its OK to ask here in github...
I am working on continuous robot control problems and was wondering which approach you are following for the continuous branch. …
meppe updated
8 years ago
-
Implement an algorithm that learns common baseline for Q-values
http://arxiv.org/pdf/1301.2315.pdf