-
Yu-zx updated
4 months ago
-
@Kismuz,
I believe I have encountered a framework (A3C) limitation.
While training a few of my recent models I noticed a strange behavior. For the first part of training everything seems to work fi…
-
Dears,
Thank you for framework, Please see the output of hyperparameter training on SB3 algorithm, why the reward in all episodes doesn`t change, what`s the problem?(I copied only three outputs)The …
-
### 🚀 Feature
Hi!
I would like to implement a recurrent soft actor-critic. Is it a sensible contribution?
### Motivation
I actually need this algorithm in my projects.
### Pitch
The sb3 e…
-
I want to add attention mechanism in the maddpg network, please tell me which .py file to modify? This question has been bothering me for a long time and I would appreciate it if you could solve the…
-
In line 276 of CCM_MADDPG.py, I wonder why " newactor_action_var = self.actors[agent_id](states_var[:, agent_id, :]" instead of "newactor_action_var = self.actors[agent_id](next_states_var[:, agent_id…
-
Implement automatic tuning of temperature parameter of entropy and reproduce results from [Soft Actor-Critic Algorithms and Applications](https://arxiv.org/abs/1812.05905).
-
As the title says: PPO and PPO2 algorithm both have the actor-critic structure, but this code I can't find that.
Does this really implement the PPO2 algorithm?
-
Firstly understand mappo algo.
-
There are recurrent (LSTM) policy options for sb3 (e.g. [RecurrentPPO](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py)). It w…