-
self.Critic_return, self.advantage = trfl.sequence_advantage_critic_loss(self.baseline_,
self.reward_, self.discount_, self.bootstrap_, lambda_=lambda_,
…
-
Hi, I would like to contribute to rllab. Could you please tell me where to start from ? I mean can you give me some task or list of tasks to start contributing ?
Background: I am a final year b.tech …
-
I wonder if lstm+ppo/sac could use in Tianshou? Since there are some problems.
-
Hi!
Let's bring the reinforcement learning course to all the Korean-speaking community 🌏 (currently 9 out of 77 complete)
Would you want to translate? Please follow the 🤗 [TRANSLATING guide](ht…
-
Hi!
Let's bring the reinforcement learning course to all the Russian-speaking community 🌏
Would you want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/tran…
-
I noticed across many of the implementations of actor-critic policies, the Rollout/Buffer/Trajectories object is inconsistent, in that some authors send the arrays to device as tensors during insertio…
-
## Describe the bug
Not quite sure if this is supported behavior, but if I set `functional=True` for the A2C loss and `shifted=True` for `TD0Estimator` I get an internal error.
## To Reproduce
…
-
I didn't change anything about `8_Actor_Critic_Advantage/AC_CartPole.py`. I just ran it, but I got this
```
RuntimeError: one of the variables needed for gradient computation has been modified by …
-
Hi again. I finally found some time to continue with your book. This time I ran into a problem in chapters 10 and 12, where you have the policy and the actor-critic agents (same problem for both). Aft…
-
For reference, we will collect a list of discussed papers as well as the date of discussion in this issue.
leezu updated
7 years ago