agakshat / maddpg

Implementation of Multi-Agent Deep Deterministic Policy Gradients
35 stars 9 forks source link

Training #8

Closed emanuelepesce closed 6 years ago

emanuelepesce commented 6 years ago

Usually how long it takes for training? Currently it has been running for almost 20 hours (around 5700 episodes), but it doesn't look like it is converging.

What happens is that the green agent tend to go out of the screen, while the red agents go around for a while. Is this normal or do I have to wait longer?

agakshat commented 6 years ago

This repository is a work in (gradual) progress. I haven't been able to get it to converge yet either. I'm currently trying out various modifications like parameter sharing, stabilising experience replay etc. and seeing if anything works. Right now I have set the episode to end only when all of the agents go out of the screen. If you want, you can change it to end when even one agent goes out (or the green agent goes out), but in my experience that was ending episodes way too early and even preventing any learning.

P.S. The original authors of the paper had not implemented any such ending, and said that eventually the agents just learned to stay within the screen to maximize their reward.

emanuelepesce commented 6 years ago

Ok, great, thanks for your support. I will keep you up of date in case I should find any solutions or improvements.