FLAIROx / JaxMARL

Multi-Agent Reinforcement Learning with JAX
Apache License 2.0
414 stars 72 forks source link

Hanabi obl pytorch #63

Closed ravihammond closed 7 months ago

ravihammond commented 7 months ago

Pytorch obl now works

hnekoeiq commented 7 months ago

Hi @ravihammond and @mttga,

Thanks for the great repo! I was wondering if you have been able to reproduce the Hanabi IQL/VDN results? I just tried with the config file _qlearnhanabi.yaml (python baselines/QLearning/iql.py +alg=qlearn_hanabi +env=hanabi) and the following is the agent's performance after almost 200 million timesteps: image

Looking at the original Hanabi paper, 100 million steps should be enough to reach around 20. I'd appreciate it if you share your thoughts.

mttga commented 7 months ago

Hi @hnekoeiq, no we are still working on that. The implementations we have of IQL-VDN are baselines for simple environments, while the original c++ ones are much more sophisticated and use many tricks. Meanwhile you can use IPPO which is fast and converges.