-
Hi!
Let's bring the reinforcement learning course to all the Russian-speaking community 🌏
Would you want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/tran…
-
## 🚀 Feature
There seems to be fair few inefficiencies in the RL model code.
In both the VPG and DQN code, the network is computed twice, once to generate the trajectory and then once again in the…
-
Current Setup:
DQN and Q-learning are already done with Discrete Control Actions
To be introduced:
Policy Gradient Algorithm with Continuous Control Actions
Major Changes:
1. Add a continuous…
-
You mentioned "
When training with policy gradient (pg)
you may need a reversed model
the reversed model is also trained by cornell movie-dialogs dataset, but with source and target reversed.
…
-
I try to implement getting the second order gradient by tf_agent.
The reason why I do second order gradient is came from meta-learning algorithm [MAML](https://arxiv.org/abs/1703.03400).
First I c…
-
Hello,
In the [asynchronous dqn paper](http://arxiv.org/pdf/1602.01783v1.pdf), they also described an on policy method, the advantage actor-critic (A3C), which achieved better results than others, do …
-
Hi!
Let's bring the reinforcement learning course to all the Korean-speaking community 🌏 (currently 9 out of 77 complete)
Would you want to translate? Please follow the 🤗 [TRANSLATING guide](ht…
-
In Drake `Distribution` class, currently we support `Sample()` function, https://github.com/RobotLocomotion/drake/blob/40e116d44929301d261f15f4d79c0d29b1e8293f/common/schema/stochastic.h#L203-L213
…
-
I think you should use `tf.stop_gradient()` in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a …
-
I have been using these two routines to figure out the best learning rate to apply with awesome results on SAC. However, the changes in the `temperature` alter those values along the way. Probably wou…