Denys88 / rl_games

RL implementations
MIT License
820 stars 138 forks source link

Implementation and performance questions #228

Closed mmcaulif closed 11 months ago

mmcaulif commented 1 year ago

I just have some questions about your implementation of MAPPO.

  1. Are there any other major changes in your implementation other than the use of convolutional networks? I know the paper used RNN's, n-step learning and parallel environments.
  2. Have you compared the performance of your selected hyperparameters on other MAPPO implementations?

Just wondering as the solving of the simple environments in ~150k timesteps is significantly better than anything I could achieve in my research with tuning MAPPO so hoping for some tips/pointers :)

Denys88 commented 1 year ago

Hi, @mmcaulif my baseline paper was implemented long time ago using tensorflow https://github.com/Denys88/rl_games/tree/0871084d8d95954fa165dbe93eadb54773b7a36a Main feature that I just stacked 4 frames and used conv1d.

I have a lot of different ppo experiments including central value and lstm on pytorch but there are cases where my old implementationor MAPPO paper is better.

In the mappo they made pretty interesting improvements which I didn't implement in my repo: global state tuning and death masking.

Overall this sc2 benchmark is pretty strange and might depend a lot on initial action distribution. for example if moving left for all unit has highest probability on untrained neural network it might make training much faster.