Implementation of multi-agent deep deterministic policy gradients.
It's been tested with the simple tag environment in the multiagent-particle-envs repo released by OpenAI, however that version does not have bounds on the environment and has not implemented a Done callback which means that each episode goes to 1000 steps even if the agents have all gone out of bound - which keeps happening and (in my opinion) slows down training. I have put in that done callback function (in the simple tag envt only - though doing it for others should be pretty easy). Please install my fork of the multiagent-particle-envs repository to use this repository properly. Main Requirements:
How to use:
multiagent-particle-envs
repo is installed, which means that import make_env
in Python 3 should be working.python3 multiagent.py
. Should run straight out of the box.Code Breakdown:
training-code.py
is the entry code which takes in user arguments for learning rates, episode length, discount factor etc, creates the actor and critic networks for each agent and calls the training function.Train.py
implements the actual MADDPG algorithmactorcriticv2.py
defines the Actor and Critic network classesReplayMemory.py
defines the Replay Memory classExplorationNoise.py
defines the Ornstein-Uhlenbeck Action Noise that has been used for exploration. I'm not sure if this is the right noise generation process that should be used.To-Do
simple_tag
environment, might be easier to learn. If anyone does this, please let me know of the results you got!