-
It seems that you are not implementing the reparametrization trick when taking an action
https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/model.py#L98-L99
although you wrote it …
-
https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L87
Should we here use `new_action` or `self.policy(next_state_batch)`?
-
### System information
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: RedHat 6
- **Ray installed from (source or binary)**: source
- **Ray version**: master
- **Python version*…
-
-
A2C works well with discrete actions, however, it does not seem to train with continuous actions.
To reproduce: https://github.com/araffin/rl-baselines-zoo with LunarLanderContinuous-v2 for instanc…
-
The policy-loss in the her+ddpg implementation is defined as following:
```
self.pi_loss_tf = -tf.reduce_mean(self.main.Q_pi_tf)
self.pi_loss_tf += self.action_l2 * tf.reduce_mean(tf.square(self.ma…
-
Perhaps I'm missing something basic here:
```
~/soft_actor_critic debug-hindsight
(sac) ❯ curl -OL https://github.com/fsavje/math-with-slack/releases/download/v0.2.5/math-with-slack.sh
% Total …
-
Installed via pip, running python3.6, TensorFlow 1.9
Environment is HalfCheetah-v2 (yet observed also in other environments).
`env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=…
-
Hello, I want to write a continuous action space policy for PPO2 with a small constraint. My output should be a vector whose components sum 1.0 and are >=0.0 each.
Until now I have been processing …
-
For soft actor-critic, how do we specify the reward scale?
How do we specify it in `examples/sac.py`?
qizhg updated
6 years ago