MADDPG Config - Githubissues

florin-pop commented 3 months ago

Thank you for the amazing work you've put into VMAS and BenchMARL.

I tried and failed to reproduce the results from https://github.com/proroklab/VectorizedMultiAgentSimulator/issues/62 and I am not sure if I am missing some important piece of configuration.

When I run: python benchmarl/run.py task=pettingzoo/simple_reference algorithm=mappo the logs appear to indicate that the training works: the reward increases until it reaches a plateau and the videos show the agents moving.

However, when I try to use MADDPG by running: python benchmarl/run.py task=pettingzoo/simple_reference algorithm=maddpg the training process proceeds, but the agents are not exploring the environment or communicating. This is not singular to this task, as it happens for simple_spread as well. If I try to use MADDPG with VMAS/nagivation it works as expected. I experimented with different versions of PettingZoo and benchMARL, but it didn't seem to make a difference so I'm thinking that I may be missing something.

My config is the default:

algorithm_config:
  desc: null
  value:
    delay_value: true
    loss_function: l2
    share_param_critic: true
    use_tanh_mapping: true
algorithm_name:
  desc: null
  value: maddpg
continuous_actions:
  desc: null
  value: true
environment_name:
  desc: null
  value: pettingzoo
experiment_config:
  desc: null
  value:
    adam_eps: 1.0e-06
    checkpoint_interval: 300000.0
    clip_grad_norm: true
    clip_grad_val: 5.0
    create_json: true
    evaluation: true
    evaluation_deterministic_actions: true
    evaluation_episodes: 200
    evaluation_interval: 60000
    exploration_anneal_frames: 1000000
    exploration_eps_end: 0.01
    exploration_eps_init: 0.8
    gamma: 0.9
    hard_target_update_frequency: 5
    loggers:
    - wandb
    lr: 5.0e-05
    max_n_frames: 10000000
    max_n_iters: null
    off_policy_collected_frames_per_batch: 6000
    off_policy_init_random_frames: 0
    off_policy_memory_size: 1000000
    off_policy_n_envs_per_worker: 60
    off_policy_n_optimizer_steps: 1000
    off_policy_train_batch_size: 128
    on_policy_collected_frames_per_batch: 60000
    on_policy_minibatch_size: 4096
    on_policy_n_envs_per_worker: 600
    on_policy_n_minibatch_iters: 45
    polyak_tau: 0.005
    prefer_continuous_actions: true
    render: true
    restore_file: null
    sampling_device: cuda
    save_folder: artifacts
    share_policy_params: false
    soft_target_update: true
    train_device: cuda
model_config:
  desc: null
  value:
    activation_class: torch.nn.modules.activation.Tanh
    activation_kwargs: null
    layer_class: torch.nn.modules.linear.Linear
    norm_class: null
    norm_kwargs: null
    num_cells:
    - 256
    - 256
model_name:
  desc: null
  value: mlp
on_policy:
  desc: null
  value: false
seed:
  desc: null
  value: 0
task_config:
  desc: null
  value:
    continuous_actions: true
    local_ratio: 0.5
    max_cycles: 100
    task: simple_reference_v3
task_name:
  desc: null
  value: simple_reference

matteobettini commented 3 months ago

Hello!

This is because it might need some hyperparameter tuning. VMAS parameters have been tuned for MADDPG in the fine_tuned folder and that is why you see VMAS working. In particular, here are some points:

Is there a reason why you need to use Pettingzoo MPE? If not I would suggest using the VMAS MPE. python fine_tuned/vmas/vmas_run.py task=vmas/simple_reference algorithm=maddpg should work. With VMAS you also benefit from the speed of vectorized simulation
If you cannot use vmas for some reason, then you need to tune MADDPG for PettingZoo MPE. I would do so taking some inspiration from the MADDPG parameters used in the VMAS. In particular, I would pay more attention to:
- exploration_anneal_frames
- off_policy_collected_frames_per_batch
- off_policy_n_optimizer_steps
- off_policy_memory_size
- gamma (change to 0.99)
you are not seeing agent explore because the videos come from evaluation which is performed with deterministic actions. to see them explore set evaluation_deterministic_actions=False

florin-pop commented 3 months ago

Thank you for the lightning fast response!

I would definitely prefer to use VMAS/MPE over PettingZoo/MPE due to its speed. The reason I started with PettingZoo was the fact that I wanted to have a baseline as starting point for our experiments and my understanding from reading the https://github.com/proroklab/VectorizedMultiAgentSimulator/issues/62 was that you used the VMAS fine-tuned parameters for training agents in PettingZoo as well so I copied those parameters.

evaluation_deterministic_actions: False solved the problem with seeing the agents explore the PettingZoo environment, is this not needed for VMAS/MPE with MADDPG? i noticed that it's true by default

matteobettini commented 3 months ago

I would definitely prefer to use VMAS/MPE over PettingZoo/MPE due to its speed. The reason I started with PettingZoo was the fact that I wanted to have a baseline as starting point for our experiments and my understanding from reading the proroklab/VectorizedMultiAgentSimulator#62 was that you used the VMAS fine-tuned parameters for training agents in PettingZoo as well so I copied those parameters.

VMAS hyperparameters will not work out of the box for pettingzoo, they might be requiring some adjustments. I think in that post i used vmas.

Also, parameter sharing is crucial to be OFF here, from your config you definitely want share_param_critic: false in the algorithm

evaluation_deterministic_actions: False solved the problem with seeing the agents explore the PettingZoo environment, is this not needed for VMAS/MPE with MADDPG? i noticed that it's true by default

You generally do not want your agents to explore during evaluation, this is why it is True by default. I suggested you to turn it off cause you asked to see them explore. Notice that this has not impact at all on training

florin-pop commented 3 months ago

Thanks Matteo, the agents are learning the task using MADDPG in the VMAS/MPE environment using the following config:


Loaded config:

experiment:
  sampling_device: cuda
  train_device: cuda
  share_policy_params: false
  prefer_continuous_actions: true
  gamma: 0.9
  lr: 5.0e-05
  adam_eps: 1.0e-06
  clip_grad_norm: true
  clip_grad_val: 5.0
  soft_target_update: true
  polyak_tau: 0.005
  hard_target_update_frequency: 5
  exploration_eps_init: 0.8
  exploration_eps_end: 0.01
  exploration_anneal_frames: 1000000
  max_n_iters: null
  max_n_frames: 10000000
  on_policy_collected_frames_per_batch: 60000
  on_policy_n_envs_per_worker: 600
  on_policy_n_minibatch_iters: 45
  on_policy_minibatch_size: 4096
  off_policy_collected_frames_per_batch: 6000
  off_policy_n_envs_per_worker: 60
  off_policy_n_optimizer_steps: 1000
  off_policy_train_batch_size: 128
  off_policy_memory_size: 1000000
  off_policy_init_random_frames: 0
  evaluation: true
  render: true
  evaluation_interval: 120000
  evaluation_episodes: 200
  evaluation_deterministic_actions: true
  loggers:
  - wandb
  create_json: true
  save_folder: artifacts
  restore_file: null
  checkpoint_interval: 300000.0
algorithm:
  share_param_critic: true
  loss_function: l2
  delay_value: true
  use_tanh_mapping: true
task:
  max_steps: 100
model:
  name: mlp
  num_cells:
  - 256
  - 256
  layer_class: torch.nn.Linear
  activation_class: torch.nn.Tanh
  activation_kwargs: null
  norm_class: null
  norm_kwargs: null
critic_model:
  name: mlp
  num_cells:
  - 256
  - 256
  layer_class: torch.nn.Linear
  activation_class: torch.nn.Tanh
  activation_kwargs: null
  norm_class: null
  norm_kwargs: null
seed: 0
maddpg_config:
  share_param_critic: false```

https://github.com/facebookresearch/BenchMARL/assets/13623704/dd37abc6-3505-410f-80d9-34840dcb6fa0

The issue can be marked as resolved.

matteobettini commented 3 months ago

Lovely! Feel free to reach out in case you face more issues!

facebookresearch / BenchMARL

MADDPG Config #86