-
I have tried running the ppo, ddpg, and vpg for the CarRacing-v0 and continuously receive the same ValueError :
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 96 for 'v/Squeeze'…
-
If I take the DDPG example and add a dropout layer to either the actor or critic model, I get an AssertionError from Theano (but not Tensorflow):
```
Traceback (most recent call last):
File "dd…
-
I am running the examples on my Ubuntu machine Intel® Core i7-4770K CPU @ 3.50GHz with 4 cores. During the entire training, only ~25% of the CPU is used. Which means it is running on only one core. Am…
-
1. [Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards](https://arxiv.org/pdf/1707.08817.pdf)
-
Hi,
I have a question about the plots presented in `Ch8`, in the section of **"Training and testing the deep n-step advantage actor-critic agent"** in the book.
The Tensorbroad plots in this se…
-
Hi @normandipalo, amazing implementation of PPO.
I tried to run the code for 10000 episodes, In the end, the robot acquires a behaviour to move the block randomly which is intuitive as it is trained…
-
Hi there, thanks for sharing your code -- its been very helpful!
One question: is your implementation of the A2C a 'genuine' actor-critic method? My (limited) understanding was that to qualify as …
-
Hi, the DPG critic update (see Algorithm 1 of Lillicrap et al. 2016, https://arxiv.org/abs/1509.02971) is substantively the same as your td_learning function; however, this is currently obscured. I wo…
-
-
# Next paper candidates
Let's propose papers to study next! All papers mentioned in the comments of this issue will be listed in the next vote.