Closed florin-pop closed 3 months ago
Hello!
This is because it might need some hyperparameter tuning. VMAS parameters have been tuned for MADDPG in the fine_tuned
folder and that is why you see VMAS working. In particular, here are some points:
Is there a reason why you need to use Pettingzoo MPE? If not I would suggest using the VMAS MPE. python fine_tuned/vmas/vmas_run.py task=vmas/simple_reference algorithm=maddpg
should work. With VMAS you also benefit from the speed of vectorized simulation
If you cannot use vmas for some reason, then you need to tune MADDPG for PettingZoo MPE. I would do so taking some inspiration from the MADDPG parameters used in the VMAS. In particular, I would pay more attention to:
you are not seeing agent explore because the videos come from evaluation which is performed with deterministic actions. to see them explore set evaluation_deterministic_actions=False
Thank you for the lightning fast response!
I would definitely prefer to use VMAS/MPE over PettingZoo/MPE due to its speed. The reason I started with PettingZoo was the fact that I wanted to have a baseline as starting point for our experiments and my understanding from reading the https://github.com/proroklab/VectorizedMultiAgentSimulator/issues/62 was that you used the VMAS fine-tuned parameters for training agents in PettingZoo as well so I copied those parameters.
evaluation_deterministic_actions: False
solved the problem with seeing the agents explore the PettingZoo
environment, is this not needed for VMAS/MPE with MADDPG? i noticed that it's true by default
I would definitely prefer to use VMAS/MPE over PettingZoo/MPE due to its speed. The reason I started with PettingZoo was the fact that I wanted to have a baseline as starting point for our experiments and my understanding from reading the proroklab/VectorizedMultiAgentSimulator#62 was that you used the VMAS fine-tuned parameters for training agents in PettingZoo as well so I copied those parameters.
VMAS hyperparameters will not work out of the box for pettingzoo, they might be requiring some adjustments. I think in that post i used vmas.
Also, parameter sharing is crucial to be OFF here, from your config you definitely want share_param_critic: false
in the algorithm
evaluation_deterministic_actions: False
solved the problem with seeing the agents explore thePettingZoo
environment, is this not needed for VMAS/MPE with MADDPG? i noticed that it's true by default
You generally do not want your agents to explore during evaluation, this is why it is True by default. I suggested you to turn it off cause you asked to see them explore. Notice that this has not impact at all on training
Thanks Matteo, the agents are learning the task using MADDPG in the VMAS/MPE environment using the following config:
Loaded config:
experiment:
sampling_device: cuda
train_device: cuda
share_policy_params: false
prefer_continuous_actions: true
gamma: 0.9
lr: 5.0e-05
adam_eps: 1.0e-06
clip_grad_norm: true
clip_grad_val: 5.0
soft_target_update: true
polyak_tau: 0.005
hard_target_update_frequency: 5
exploration_eps_init: 0.8
exploration_eps_end: 0.01
exploration_anneal_frames: 1000000
max_n_iters: null
max_n_frames: 10000000
on_policy_collected_frames_per_batch: 60000
on_policy_n_envs_per_worker: 600
on_policy_n_minibatch_iters: 45
on_policy_minibatch_size: 4096
off_policy_collected_frames_per_batch: 6000
off_policy_n_envs_per_worker: 60
off_policy_n_optimizer_steps: 1000
off_policy_train_batch_size: 128
off_policy_memory_size: 1000000
off_policy_init_random_frames: 0
evaluation: true
render: true
evaluation_interval: 120000
evaluation_episodes: 200
evaluation_deterministic_actions: true
loggers:
- wandb
create_json: true
save_folder: artifacts
restore_file: null
checkpoint_interval: 300000.0
algorithm:
share_param_critic: true
loss_function: l2
delay_value: true
use_tanh_mapping: true
task:
max_steps: 100
model:
name: mlp
num_cells:
- 256
- 256
layer_class: torch.nn.Linear
activation_class: torch.nn.Tanh
activation_kwargs: null
norm_class: null
norm_kwargs: null
critic_model:
name: mlp
num_cells:
- 256
- 256
layer_class: torch.nn.Linear
activation_class: torch.nn.Tanh
activation_kwargs: null
norm_class: null
norm_kwargs: null
seed: 0
maddpg_config:
share_param_critic: false```
https://github.com/facebookresearch/BenchMARL/assets/13623704/dd37abc6-3505-410f-80d9-34840dcb6fa0
The issue can be marked as resolved.
Lovely! Feel free to reach out in case you face more issues!
Thank you for the amazing work you've put into VMAS and BenchMARL.
I tried and failed to reproduce the results from https://github.com/proroklab/VectorizedMultiAgentSimulator/issues/62 and I am not sure if I am missing some important piece of configuration.
When I run:
python benchmarl/run.py task=pettingzoo/simple_reference algorithm=mappo
the logs appear to indicate that the training works: the reward increases until it reaches a plateau and the videos show the agents moving.However, when I try to use MADDPG by running:
python benchmarl/run.py task=pettingzoo/simple_reference algorithm=maddpg
the training process proceeds, but the agents are not exploring the environment or communicating. This is not singular to this task, as it happens for simple_spread as well. If I try to use MADDPG with VMAS/nagivation it works as expected. I experimented with different versions of PettingZoo and benchMARL, but it didn't seem to make a difference so I'm thinking that I may be missing something.My config is the default: