Replicable-MARL / MARLlib

One repository is all that is necessary for Multi-agent Reinforcement Learning (MARL)
https://marllib.readthedocs.io
MIT License
935 stars 151 forks source link
deep-reinforcement-learning multi-agent-reinforcement-learning pytorch ray rllib

MARLlib: A Multi-agent Reinforcement Learning Library

[GitHub license]() coverage Documentation Status GitHub issues PyPI version Open In Colab Organization Organization Awesome

:exclamation: News
March 2023 :anchor:We are excited to announce that a major update has just been released. For detailed version information, please refer to the version info.
May 2023 Exciting news! MARLlib now supports five more tasks: MATE, GoBigger, Overcooked-AI, MAPDN, and AirCombat. Give them a try!
June 2023 OpenAI: Hide and Seek and SISL environments are incorporated into MARLlib.
Aug 2023 :tada:MARLlib has been accepted for publication in JMLR.
Sept 2023 Latest PettingZoo with Gymnasium are compatiable within MARLlib.
Nov 2023 We are currently in the process of creating a hands-on MARL book and aim to release the draft by the end of 2023.

Multi-agent Reinforcement Learning Library (MARLlib) is a MARL library that utilizes Ray and one of its toolkits RLlib. It offers a comprehensive platform for developing, training, and testing MARL algorithms across various tasks and environments.

Here's an example of how MARLlib can be used:

from marllib import marl

# prepare env
env = marl.make_env(environment_name="mpe", map_name="simple_spread", force_coop=True)

# initialize algorithm with appointed hyper-parameters
mappo = marl.algos.mappo(hyperparam_source='mpe')

# build agent model based on env + algorithms + user preference
model = marl.build_model(env, mappo, {"core_arch": "mlp", "encode_layer": "128-256"})

# start training
mappo.fit(env, model, stop={'timesteps_total': 1000000}, share_policy='group')

Why MARLlib?

Here we provide a table for the comparison of MARLlib and existing work.

Library Supported Env Algorithm Parameter Sharing Model
PyMARL 1 cooperative 5 share GRU :x:
PyMARL2 2 cooperative 11 share MLP + GRU :x:
MAPPO Benchmark 4 cooperative 1 share + separate MLP + GRU :x:
MAlib 4 self-play 10 share + group + separate MLP + LSTM Documentation Status
EPyMARL 4 cooperative 9 share + separate GRU :x:
HARL 8 cooperative 9 share + separate MLP + CNN + GRU :x:
MARLlib 17 no task mode restriction 18 share + group + separate + customizable MLP + CNN + GRU + LSTM Documentation Status
Library Github Stars Documentation Issues Open Activity Last Update
PyMARL GitHub stars :x: GitHub opened issue GitHub commit-activity GitHub last commit
PyMARL2 GitHub stars :x: GitHub opened issue GitHub commit-activity GitHub last commit
MAPPO Benchmark GitHub stars :x: GitHub opened issue GitHub commit-activity GitHub last commit
MAlib GitHub stars Documentation Status GitHub opened issue GitHub commit-activity GitHub last commit
EPyMARL GitHub stars :x: GitHub opened issue GitHub commit-activity GitHub last commit
HARL* GitHub stars :x: GitHub opened issue GitHub commit-activity GitHub last commit
MARLlib GitHub stars Documentation Status GitHub opened issue GitHub commit-activity GitHub last commit

* HARL is the latest MARL library that has been recently released:fire:. If cutting-edge MARL algorithms with state-of-the-art performance are your target, HARL is definitely worth a look!

key features

:beginner: MARLlib offers several key features that make it stand out:

:rocket: Using MARLlib, you can take advantage of various benefits, such as:

Installation

Note: Please note that at this time, MARLlib is only compatible with Linux operating systems.

Step-by-step (recommended)

1. install dependencies (basic)

First, install MARLlib dependencies to guarantee basic usage. following this guide, finally install patches for RLlib.

$ conda create -n marllib python=3.8 # or 3.9
$ conda activate marllib
$ git clone https://github.com/Replicable-MARL/MARLlib.git && cd MARLlib
$ pip install -r requirements.txt

2. install environments (optional)

Please follow this guide.

Note: We recommend the gym version around 0.20.0.

pip install "gym==0.20.0"

3. install patches (basic)

Fix bugs of RLlib using patches by running the following command:

$ cd /Path/To/MARLlib/marllib/patch
$ python add_patch.py -y

PyPI

$ pip install --upgrade pip
$ pip install marllib

Docker-based usage

We provide a Dockerfile for building the MARLlib docker image in MARLlib/docker/Dockerfile and a devcontainer setup in MARLlib/.devcontainer folder. If you use the devcontainer, one thing to note is that you may need to customise certain arguments in runArgs of devcontainer.json according to your hardware, for example the --shm-size argument.

Getting started

Prepare the configuration There are four parts of configurations that take charge of the whole training process. - scenario: specify the environment/task settings - algorithm: choose the hyperparameters of the algorithm - model: customize the model architecture - ray/rllib: change the basic training settings
Before training, ensure all the parameters are set correctly, especially those you don't want to change. > __Note__: > You can also modify all the pre-set parameters via MARLLib API.*
Register the environment Ensure all the dependencies are installed for the environment you are running with. Otherwise, please refer to [MARLlib documentation](https://marllib.readthedocs.io/en/latest/handbook/env.html). | task mode | api example | | :-----------: | ----------- | | cooperative | ```marl.make_env(environment_name="mpe", map_name="simple_spread", force_coop=True)``` | | collaborative | ```marl.make_env(environment_name="mpe", map_name="simple_spread")``` | | competitive | ```marl.make_env(environment_name="mpe", map_name="simple_adversary")``` | | mixed | ```marl.make_env(environment_name="mpe", map_name="simple_crypto")``` | Most of the popular environments in MARL research are supported by MARLlib: | Env Name | Learning Mode | Observability | Action Space | Observations | | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | | **[LBF](https://marllib.readthedocs.io/en/latest/handbook/env.html#lbf)** | cooperative + collaborative | Both | Discrete | 1D | | **[RWARE](https://marllib.readthedocs.io/en/latest/handbook/env.html#rware)** | cooperative | Partial | Discrete | 1D | | **[MPE](https://marllib.readthedocs.io/en/latest/handbook/env.html#mpe)** | cooperative + collaborative + mixed | Both | Both | 1D | | **[SISL](https://marllib.readthedocs.io/en/latest/handbook/env.html#sisl)** | cooperative + collaborative | Full | Both | 1D | | **[SMAC](https://marllib.readthedocs.io/en/latest/handbook/env.html#smac)** | cooperative | Partial | Discrete | 1D | | **[MetaDrive](https://marllib.readthedocs.io/en/latest/handbook/env.html#metadrive)** | collaborative | Partial | Continuous | 1D | | **[MAgent](https://marllib.readthedocs.io/en/latest/handbook/env.html#magent)** | collaborative + mixed | Partial | Discrete | 2D | | **[Pommerman](https://marllib.readthedocs.io/en/latest/handbook/env.html#pommerman)** | collaborative + competitive + mixed | Both | Discrete | 2D | | **[MAMuJoCo](https://marllib.readthedocs.io/en/latest/handbook/env.html#mamujoco)** | cooperative | Full | Continuous | 1D | | **[GRF](https://marllib.readthedocs.io/en/latest/handbook/env.html#google-research-football)** | collaborative + mixed | Full | Discrete | 2D | | **[Hanabi](https://marllib.readthedocs.io/en/latest/handbook/env.html#hanabi)** | cooperative | Partial | Discrete | 1D | | **[MATE](https://marllib.readthedocs.io/en/latest/handbook/env.html#mate)** | cooperative + mixed | Partial | Both | 1D | | **[GoBigger](https://marllib.readthedocs.io/en/latest/handbook/env.html#gobigger)** | cooperative + mixed | Both | Continuous | 1D | | **[Overcooked-AI](https://marllib.readthedocs.io/en/latest/handbook/env.html#overcooked-ai)** | cooperative | Full | Discrete | 1D | | **[PDN](https://marllib.readthedocs.io/en/latest/handbook/env.html#power-distribution-networks)** | cooperative | Partial | Continuous | 1D | | **[AirCombat](https://marllib.readthedocs.io/en/latest/handbook/env.html#air-combat)** | cooperative + mixed | Partial | MultiDiscrete | 1D | | **[HideAndSeek](https://marllib.readthedocs.io/en/latest/handbook/env.html#hide-and-seek)** | competitive + mixed | Partial | MultiDiscrete | 1D | Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and important notes.
Initialize the algorithm | running target | api example | | :-----------: | ----------- | | train & finetune | ```marl.algos.mappo(hyperparam_source=$ENV)``` | | develop & debug | ```marl.algos.mappo(hyperparam_source="test")``` | | 3rd party env | ```marl.algos.mappo(hyperparam_source="common")``` | Here is a chart describing the characteristics of each algorithm: | algorithm | support task mode | discrete action | continuous action | policy type | | :------------------------------------------------------------: | :-----------------: | :----------: | :--------------------: | :----------: | | *IQL** | all four | :heavy_check_mark: | | off-policy | | *[PG](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf)* | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[A2C](https://arxiv.org/abs/1602.01783)* | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[DDPG](https://arxiv.org/abs/1509.02971)* | all four | | :heavy_check_mark: | off-policy | | *[TRPO](http://proceedings.mlr.press/v37/schulman15.pdf)* | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[PPO](https://arxiv.org/abs/1707.06347)* | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[COMA](https://ojs.aaai.org/index.php/AAAI/article/download/11794/11653)* | all four | :heavy_check_mark: | | on-policy | | *[MADDPG](https://arxiv.org/abs/1706.02275)* | all four | | :heavy_check_mark: | off-policy | | *MAA2C** | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *MATRPO** | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[MAPPO](https://arxiv.org/abs/2103.01955)* | all four | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[HATRPO](https://arxiv.org/abs/2109.11251)* | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[HAPPO](https://arxiv.org/abs/2109.11251)* | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *[VDN](https://arxiv.org/abs/1706.05296)* | cooperative | :heavy_check_mark: | | off-policy | | *[QMIX](https://arxiv.org/abs/1803.11485)* | cooperative | :heavy_check_mark: | | off-policy | | *[FACMAC](https://arxiv.org/abs/2003.06709)* | cooperative | | :heavy_check_mark: | off-policy | | *[VDAC](https://arxiv.org/abs/2007.12306)* | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy | | *VDPPO** | cooperative | :heavy_check_mark: | :heavy_check_mark: | on-policy | ***all four**: cooperative collaborative competitive mixed *IQL* is the multi-agent version of Q learning. *MAA2C* and *MATRPO* are the centralized version of A2C and TRPO. *VDPPO* is the value decomposition version of PPO.
Build the agent model An agent model consists of two parts, `encoder` and `core arch`. `encoder` will be constructed by MARLlib according to the observation space. Choose `mlp`, `gru`, or `lstm` as you like to build the complete model. | model arch | api example | | :-----------: | ----------- | | MLP | ```marl.build_model(env, algo, {"core_arch": "mlp")``` | | GRU | ```marl.build_model(env, algo, {"core_arch": "gru"})``` | | LSTM | ```marl.build_model(env, algo, {"core_arch": "lstm"})``` | | Encoder Arch | ```marl.build_model(env, algo, {"core_arch": "gru", "encode_layer": "128-256"})``` |
Kick off the training | setting | api example | | :-----------: | ----------- | | train | ```algo.fit(env, model)``` | | debug | ```algo.fit(env, model, local_mode=True)``` | | stop condition | ```algo.fit(env, model, stop={'episode_reward_mean': 2000, 'timesteps_total': 10000000})``` | | policy sharing | ```algo.fit(env, model, share_policy='all') # or 'group' / 'individual'``` | | save model | ```algo.fit(env, model, checkpoint_freq=100, checkpoint_end=True)``` | | GPU accelerate | ```algo.fit(env, model, local_mode=False, num_gpus=1)``` | | CPU accelerate | ```algo.fit(env, model, local_mode=False, num_workers=5)``` |
Training & rendering API ```py from marllib import marl # prepare env env = marl.make_env(environment_name="smac", map_name="5m_vs_6m") # initialize algorithm with appointed hyper-parameters mappo = marl.algos.mappo(hyperparam_source="smac") # build agent model based on env + algorithms + user preference model = marl.build_model(env, mappo, {"core_arch": "gru", "encode_layer": "128-256"}) # start training mappo.fit( env, model, stop={"timesteps_total": 1000000}, checkpoint_freq=100, share_policy="group" ) # rendering mappo.render( env, model, local_mode=True, restore_path={'params_path': "checkpoint/params.json", 'model_path': "checkpoint/checkpoint-10"} ) ```

Results

Under the current working directory, you can find all the training data (logging and TensorFlow files) as well as the saved models. To visualize the learning curve, you can use Tensorboard. Follow the steps below:

  1. Install Tensorboard by running the following command:

    pip install tensorboard
  2. Use the following command to launch Tensorboard and visualize the results:

    tensorboard --logdir .

Alternatively, you can refer to this tutorial for more detailed instructions.

For a list of all the existing results, you can visit this link. Please note that these results were obtained from an older version of MARLlib, which may lead to inconsistencies when compared to the current results.

Quick examples

MARLlib provides some practical examples for you to refer to.

Tutorials

Try MPE + MAPPO examples on Google Colaboratory! Open In Colab More tutorial documentations are available here.

Awesome List

A collection of research and review papers of multi-agent reinforcement learning (MARL) is available. The papers have been organized based on their publication date and their evaluation of the corresponding environments.

Algorithms: Awesome Environments: Awesome

Community

Channel Link
Issues GitHub Issues

Roadmap

The roadmap to the future release is available in ROADMAP.md.

Contributing

We are a small team on multi-agent reinforcement learning, and we will take all the help we can get! If you would like to get involved, here is information on contribution guidelines and how to test the code locally.

You can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.

Citation

If you use MARLlib in your research, please cite the MARLlib paper.

@article{hu2022marllib,
  author  = {Siyi Hu and Yifan Zhong and Minquan Gao and Weixun Wang and Hao Dong and Xiaodan Liang and Zhihui Li and Xiaojun Chang and Yaodong Yang},
  title   = {MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
}

Works that are based on or closely collaborate with MARLlib <link>

@InProceedings{hu2022policy,
      title={Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent {RL}},
      author={Hu, Siyi and Xie, Chuanlong and Liang, Xiaodan and Chang, Xiaojun},
      booktitle={Proceedings of the 39th International Conference on Machine Learning},
      year={2022},
}
@misc{zhong2023heterogeneousagent,
      title={Heterogeneous-Agent Reinforcement Learning}, 
      author={Yifan Zhong and Jakub Grudzien Kuba and Siyi Hu and Jiaming Ji and Yaodong Yang},
      archivePrefix={arXiv},
      year={2023},
}