soft-actor-critic Search Results

384 results
for soft-actor-critic

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pranz24/pytorch-soft-actor-critic #5

reparametrization trick issue

It seems that you are not implementing the reparametrization trick when taking an action https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/model.py#L98-L99 although you wrote it …

tldoan updated 5 years ago
10
pranz24/pytorch-soft-actor-critic #2

A question in the deterministic case

https://github.com/pranz24/pytorch-soft-actor-critic/blob/master/sac.py#L87 Should we here use `new_action` or `self.policy(next_state_batch)`?

roosephu updated 5 years ago
3
ray-project/ray #3678

How to specify seeds and pendulum-ddpg.yaml works poorly

### System information - **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: RedHat 6 - **Ray installed from (source or binary)**: source - **Ray version**: master - **Python version*…

joneswong updated 5 years ago
2
kairproject/schedule #13

[120분] DDPG 논문 및 코드 리뷰

whikwon updated 5 years ago
10
hill-a/stable-baselines #109

A2C Performance with continuous actions

A2C works well with discrete actions, however, it does not seem to train with continuous actions. To reproduce: https://github.com/araffin/rl-baselines-zoo with LunarLanderContinuous-v2 for instanc…

araffin updated 5 years ago
12
openai/baselines #492

Confused about Her+DDPG policy-loss

The policy-loss in the her+ddpg implementation is defined as following: ``` self.pi_loss_tf = -tf.reduce_mean(self.main.Q_pi_tf) self.pi_loss_tf += self.action_l2 * tf.reduce_mean(tf.square(self.ma…

astier updated 6 years ago
8
fsavje/math-with-slack #40

Not working

Perhaps I'm missing something basic here: ``` ~/soft_actor_critic debug-hindsight (sac) ❯ curl -OL https://github.com/fsavje/math-with-slack/releases/download/v0.2.5/math-with-slack.sh % Total …

ethanabrooks updated 6 years ago
2
hill-a/stable-baselines #36

PPO2 - network diverges, outputs become NaNs

Installed via pip, running python3.6, TensorFlow 1.9 Environment is HalfCheetah-v2 (yet observed also in other environments). `env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=…

tesslerc updated 6 years ago
5
openai/baselines #476

Question about neglogp in PPO2 policies

Hello, I want to write a continuous action space policy for PPO2 with a small constraint. My output should be a vector whose components sum 1.0 and are >=0.0 each. Until now I have been processing …

pisiiki updated 6 years ago
2
rail-berkeley/rlkit #7

How do we set reward scale for SAC

For soft actor-critic, how do we specify the reward scale? How do we specify it in `examples/sac.py`?

qizhg updated 6 years ago
1

上一页 1...33 34 35 36 37 38 39...39 下一页

384 results for soft-actor-critic

384 results
for soft-actor-critic