Denys88 / rl_games

RL implementations
MIT License
820 stars 138 forks source link

Can not reproduce SMAC results. #222

Closed ZiyiLiubird closed 1 year ago

ZiyiLiubird commented 1 year ago

Hello, I can not reproduce the smac experimental results, e.g, corridor and 3m, illustrated in "https://github.com/Denys88/rl_games/blob/master/docs/SMAC.md" with the same hyper-parameters.

It seems that IPPO can't learn meaningful policies even in 3m (and corridor) after 10M steps, and the win rate is lower than 0.2 from beginning to end.

Denys88 commented 1 year ago

Hi, It sounds strange. Btw original paper was using TF implementation. I'll try to find that checkpoint. But 3m should be solved in less than 2 minute with much less steps.

Denys88 commented 1 year ago

https://github.com/Denys88/rl_games/tree/0871084d8d95954fa165dbe93eadb54773b7a36a this commit was used in paper ( tensorflow 1 implementation) You should be able to reproduce it.

For the pytorh Ill install smac envs on the weekends and will take a look what is wrong.

ZiyiLiubird commented 1 year ago

Thanks a lot! I'm looking forward to hearing back from you.

Denys88 commented 1 year ago

@ZiyiLiubird everything is fixed. There were a small issue with reporting rewards. Thanks! Please take a look at different configs. BUT if you want to get exact same results as in paper you need to use old implementation with TensorFlow. I used conv1d. In this tests I tried to use mlp + lstm and different env configurations. It would be really nice if someone find good configurations in pytorch with central value too.

ZiyiLiubird commented 1 year ago

Hi @Denys88 Thank you for your patience and advice, and for bringing such an excellent work!