Closed ravihammond closed 7 months ago
Hi @ravihammond and @mttga,
Thanks for the great repo!
I was wondering if you have been able to reproduce the Hanabi IQL/VDN results? I just tried with the config file _qlearnhanabi.yaml (python baselines/QLearning/iql.py +alg=qlearn_hanabi +env=hanabi
)
and the following is the agent's performance after almost 200 million timesteps:
Looking at the original Hanabi paper, 100 million steps should be enough to reach around 20. I'd appreciate it if you share your thoughts.
Hi @hnekoeiq, no we are still working on that. The implementations we have of IQL-VDN are baselines for simple environments, while the original c++ ones are much more sophisticated and use many tricks. Meanwhile you can use IPPO which is fast and converges.
Pytorch obl now works