-
I need that algorithm implemented here!!!
-
It occurred to me that this recent paper is an interesting one to implement inside brax
One of the cool things about brax is its differentiability, but as I understand it, attempt to leverage that …
-
Hello, I use the PPO method of your program to train the spacerobot, but I meet a problem now. I use the file(PPO/Continious/PPO/main.py) to train spacerobot, and the xml file is spacerobotstate, but …
-
Thanks for the paper, it is really cool and useful
On page 22 of the paper, it says
> For reincarnating D4PG using QDagger, we minimize a distillation loss between the D4PG’s actor policy and the …
-
The current implementation of `ActorCriticBase` makes it a bit trick to have custom actor and critic networks that have shared layers. This is because the instantiation of the networks happen in the `…
-
Hello,
In the [asynchronous dqn paper](http://arxiv.org/pdf/1602.01783v1.pdf), they also described an on policy method, the advantage actor-critic (A3C), which achieved better results than others, do …
-
I am confused by your code.
In the paper, it is mentioned that a policy gradient method [1] is used. But more specifically, I think that is implemented by Actor-Critic.
If I am wrong, plz tell m…
-
-
Go beyond what is taught in the unit and look at frontier research to solve the RL problem.
-
Several Deep RL agents that are missing such as A2C, A3C which can be added. Further work could also adding MARL agents such as MAA2C or MADDPG