-
Hi, the DPG critic update (see Algorithm 1 of Lillicrap et al. 2016, https://arxiv.org/abs/1509.02971) is substantively the same as your td_learning function; however, this is currently obscured. I wo…
-
Hi there, thanks for sharing your code -- its been very helpful!
One question: is your implementation of the A2C a 'genuine' actor-critic method? My (limited) understanding was that to qualify as …
-
## 🚀 Feature
Implement target derivative for `F.smooth_l1_loss`
## Motivation
I'm implementing an actor-critic algorithm. On the TD update step, I need gradients through both the input and the ta…
-
-
# Next paper candidates
Let's propose papers to study next! All papers mentioned in the comments of this issue will be listed in the next vote.
-
When I try to import trfl, similarly to [this](https://colab.research.google.com/drive/1yP8E9_CCO4NZ5XMYYrPOqSLfR4LlVeB0#scrollTo=Axy2D-N7InE9) public trfl colab notebook online, I get
(Note I tri…
-
Hello,
I am wondering how do you provide the training flag for batch normalization layers in your architecture specifying an actor critic function.
During inference when generating the actions …
-
The policy-loss in the her+ddpg implementation is defined as following:
```
self.pi_loss_tf = -tf.reduce_mean(self.main.Q_pi_tf)
self.pi_loss_tf += self.action_l2 * tf.reduce_mean(tf.square(self.ma…
-
I understand how to use the keras-rl framework in a limited train / test workflow as demonstrated in some of the samples.
But, how would one implement keras-rl in a scenario where one wants to depl…
olavt updated
5 years ago
-
Is this extendable for policy gradient or actor-critic architectures? Or would one have to do major re-workings? I'm trying to decide whether to use this framework for a project or implement from scra…