-
Hi,
For some reason when I try to run the trained a2c agent in CARLA it doesn't take any actions, just sits there doing nothing. These are my terminal outputs:
(rl_gym_book) amakri@amakri-Zephy…
-
1. critic 网络用来计算出v, LOSS函数应该类似dqn用迭代v的方式去逼近真实的v,再用A=Q-V的方式求出A。 而你的loss函数是去最小化advantage,怎么可以最小化advantage,我们的目标就是要尽量增大优势函数的。
# critic
with tf.variable_scope('critic'):
l1 = tf.la…
-
-
Hi,
I have a question about the plots presented in `Ch8`, in the section of **"Training and testing the deep n-step advantage actor-critic agent"** in the book.
The Tensorbroad plots in this se…
-
Hi, today i have studied the a2c_agent.py of the actor-critic algorithm, i tested it in several simple environment, and i thought this implementation needs millions steps to get the optimal policy. …
-
@praveen-palanisamy thank you for your extremely helpful code base. I have some questions that I hope you could give some insights into:
- I noticed that the reward function is the one that was int…
-
From the end of section 3 in the GAE paper: **High-Dimensional Continuous Control Using Generalized Advantage Estimation**
https://arxiv.org/pdf/1506.02438.pdf
```
Taking γ < 1 introduces bias in…
-
I was going through your code. I noticed this one
![random action](https://user-images.githubusercontent.com/11025093/43618534-df4a1da2-9703-11e8-9ac3-fb0cb9589359.png)
Why are you using np.random…
-
Hello, thank you for this great work. I have few questions as I am a newbie in Reinforcement Learning.
Would it be possible to use a single network with multi-heads rather than two? I am actually t…
-
Hello @praveen-palanisamy
I went through your book and it's really helping.
I have a question regarding "Using TensorBoard for logging and visualizing a PyTorch RL agent's progress" p. 107 (chap …