FLAIROx / JaxMARL

Multi-Agent Reinforcement Learning with JAX
Apache License 2.0
393 stars 68 forks source link

fix bug with rnn policy reconstruction, remove dependence on num_steps, fix smax conic obs space size #77

Closed amacrutherford closed 5 months ago

amacrutherford commented 5 months ago

SMAX & MPE results are good. Hanabi only good for IPPO, something weird going on with MAPPO.

SMAX results

These are over 10 seeds, max and min win rate shown with the shaded region. SMAX_3s_vs_5z SMAX_3s5z_vs_3s6z SMAX_3s5z SMAX_5m_vs_6m SMAX_6h_vs_8z SMAX_10m_vs_11m SMAX_27m_vs_30m SMAX_smacv2_5_units SMAX_smacv2_10_units SMAX_smacv2_20_units

MPE

Returns for MPE simple spread mpe_simple_spread

hanabi

top lines are IPPO, bottom lines MAPPO. @mttga and I will take a look at the cause of this hanabi

mttga commented 5 months ago

I've seen you reduced the number of envs for smax. How long does it take to train now? Nice to see good performances on MPE btw, it makes sense to increase the total timesteps there for IPPO. I think original MAPPO paper reports results on 2e7 on this env

amacrutherford commented 5 months ago

I've seen you reduced the number of envs for smax. How long does it take to train now? Nice to see good performances on MPE btw, it makes sense to increase the total timesteps there for IPPO. I think original MAPPO paper reports results on 2e7 on this env

ah yep will put that back up to 128, was at 16 for comparisons to mava as they use 16 envs across 8 workers. Yes it is! Ah cool will increase to that for the submission