Implementation and performance questions

Denys88 / rl_games

RL implementations

MIT License

820 stars 138 forks source link

Hi, @mmcaulif my baseline paper was implemented long time ago using tensorflow https://github.com/Denys88/rl_games/tree/0871084d8d95954fa165dbe93eadb54773b7a36a Main feature that I just stacked 4 frames and used conv1d.

I have a lot of different ppo experiments including central value and lstm on pytorch but there are cases where my old implementationor MAPPO paper is better.

In the mappo they made pretty interesting improvements which I didn't implement in my repo: global state tuning and death masking.

Overall this sc2 benchmark is pretty strange and might depend a lot on initial action distribution. for example if moving left for all unit has highest probability on untrained neural network it might make training much faster.

Denys88 / rl_games

Implementation and performance questions #228