-
Soft Actor-Critic (SAC) [1] is currently one of the most efficient model-free RL algorithms available. Its sample-complexity is close to the best model-based reinforcement learning methods while still…
-
Hello,
Nice project =)
Quick question: did you try other algorithms that are usually more suited for continuous actions? (like soft actor-critic (SAC), DDPG and TD3 (coming in the next release) …
-
-
The implementation is currently not correct. We need to figure out why and fix it.
Run with config/debug.yaml to test it. This configuration uses a very simple environment in which the goal is to m…
-
## 🚀 Feature
Could you please add a Tanh transform to the torch.distributions.transforms module?
## Motivation
The policy network used by the Soft Actor Critic algorithm passes its output thr…
-
Dear author,
In your implementation of soft actor critic, there is no value function V(s)?
In the original paper of SAC, the authors said such value function can stabilize training and is c…
-
Respected sir,
I want to know, qf1_pi and qf2_pi are used to find min_qf_pi in sac. How the parameters of qf1_pi and qf2_pi models of sac.py are updated?
As I did not find any loss function for th…
-
Cool Work.
It seems that you have implemented sac to support discrete action space.
I wonder whether this project contains tiny demo from running soft actor critic on scenario of discrete action s…
-
Hi are there any pointers on how to reproduce the Discrete SAC code in tf2? Especially the `torch.gather()` which does not particularly behave the same way as `tf.gather` or `tf.gather_nd`. Any help w…
-
## Problem with Signal
Signal has ***copious*** privacy issues making it unfit for privacytools.io endorsement.
1. Users are forced to supply a phone number to Signal (https://github.com/privacy…
ghost updated
3 years ago