IPPO trained model shows poor test performance

satpreetsingh commented 5 months ago

I trained some agents using IPPO on MPE with the benchmark code, and used the following code for inference and animation generation: https://github.com/FLAIROx/JaxMARL/pull/72#

$ python ippo_rnn_mpe.py 
{'returns': -102.80365, 'env_step': 0}
{'returns': -100.726265, 'env_step': 3072}
{'returns': -104.75728, 'env_step': 6144}
{'returns': -94.75876, 'env_step': 9216}
{'returns': -89.8966, 'env_step': 12288}
{'returns': -94.21084, 'env_step': 15360}
{'returns': -92.89967, 'env_step': 18432}
{'returns': -93.211845, 'env_step': 21504}
...
{'returns': -67.70681, 'env_step': 1978368}
{'returns': -68.14425, 'env_step': 1981440}
{'returns': -68.150986, 'env_step': 1984512}
{'returns': -66.487785, 'env_step': 1987584}
{'returns': -65.92587, 'env_step': 1990656}
{'returns': -67.4093, 'env_step': 1993728}
{'returns': -66.18083, 'env_step': 1996800}

Saved: ippo_mpe_ep00.gif
Saved: ippo_mpe_ep01.gif

Seems like the training process converges, but visualizing the trained agents reveals poor policies. Animations attached. ippo_mpe_ep01 ippo_mpe_ep00

satpreetsingh commented 5 months ago

@amacrutherford Requesting an update on this issue. Thanks!

amacrutherford commented 5 months ago

Hi! please see PR #77

FLAIROx / JaxMARL

IPPO trained model shows poor test performance #73