-
https://www.usenix.org/conference/osdi20/presentation/qiu
-
### 🚀 Feature
independently configurable learning rates for actor and critic in AC-style algorithms
### Motivation
In literature the actor is often configured to learn slower, such that the c…
-
Thank you for your surprising work! I have successfully run the project and I have some questions.
The algorithms in GRF_MARL, including MAPPO HAPPO MAT , are implemented with the model. I want to ad…
-
Comments for https://www.endpointdev.com/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/
By Kamil Ciemniewski
To enter a comment:
1. Log in to GitHub
2. Leave a comment on this issue…
-
I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the foll…
-
Hello guys, I wonder if there is a way to train the Actor Critic algorithms in an off-policy manner, as in the paper [Sample Efficient Actor-Critic with Experience Replay](https://arxiv.org/abs/1611.0…
-
I am conducting reinforcement learning for a robot using rsl_rl and isaac lab. While it works fine with simple settings, when I switch to more complex settings (such as Domain Randomization), the foll…
-
I want to make a project using reinforcement learning in which a bot send scam to other bots on social media, other bots detect the scam and reject it.
I think it needs a deep reinforcement learning…
-
In `obj_alpha = (self.alpha_log * (self.target_entropy - log_prob).detach()).mean()` when alpha_log=0, alpha will be 1forever.
the correct way is `obj_alpha = (self.alpha * (self.target_entropy - log…
-
Hello. Thank you for your amazing work. I appreciate the efforts to provide a unified library of MARL algorithms and environments for benchmarking and reproducibility. To better achieve this goal, I s…