-
Hi
In the text of **The advantages and disadvantages of policy-gradient methods** , you wanted to write `policy-based methods` but as a typo, you wrote `value-based methods` :
> Advantages
> Ther…
-
Hello everyone,
I've encountered a problem while implementing an A2C (Advantage Actor-Critic) network involving Flax and Optax. My network includes _policy_network_ and _value_network_, each containi…
-
### System Info
```Shell
- `Accelerate` version: 0.31.0
- Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.35
- `accelerate` bash location:
- Python version: 3.10.12
- Numpy version: 1.2…
-
Hey @CR-Gjx Thanks for providing this open source code. Very helpful to study and I love the idea of hierarchical reinforcement learning.
In the recent AlphaGo Zero paper and [Thinking Fast and Slo…
-
When running the code on DMC, because the `actor_grad` is `dynamics`; therefore, `loss_policy` would be `-value_target`. `value_target` is not dependent on the actor's policy distribution, and so, `lo…
-
I've noticed that the REINFORCE algorithm (aka policy gradient, without the Q function) is not listed in the list of agents, even the not-yet-implemented ones. I presume this was intentional? How co…
-
-
This is not an issue of the code per se, but I am learning RL and am wondering how the policy gradient is calculated in `pg_reinforce.py`. In this line
```
self.cross_entropy_loss = tf.nn.sparse_so…
-
Please make sure that this is a bug. As per our
[GitHub Policy](https://github.com/tensorflow/tensorflow/blob/master/ISSUES.md),
we only address code/doc bugs, performance issues, feature requests a…
-
https://arxiv.org/abs/1611.01626