Closed ZiyiLiubird closed 1 year ago
Hi, It sounds strange. Btw original paper was using TF implementation. I'll try to find that checkpoint. But 3m should be solved in less than 2 minute with much less steps.
https://github.com/Denys88/rl_games/tree/0871084d8d95954fa165dbe93eadb54773b7a36a this commit was used in paper ( tensorflow 1 implementation) You should be able to reproduce it.
For the pytorh Ill install smac envs on the weekends and will take a look what is wrong.
Thanks a lot! I'm looking forward to hearing back from you.
@ZiyiLiubird everything is fixed. There were a small issue with reporting rewards. Thanks! Please take a look at different configs. BUT if you want to get exact same results as in paper you need to use old implementation with TensorFlow. I used conv1d. In this tests I tried to use mlp + lstm and different env configurations. It would be really nice if someone find good configurations in pytorch with central value too.
Hi @Denys88 Thank you for your patience and advice, and for bringing such an excellent work!
Hello, I can not reproduce the smac experimental results, e.g, corridor and 3m, illustrated in "https://github.com/Denys88/rl_games/blob/master/docs/SMAC.md" with the same hyper-parameters.
It seems that IPPO can't learn meaningful policies even in 3m (and corridor) after 10M steps, and the win rate is lower than 0.2 from beginning to end.