PKU-MARL / Multi-Agent-Transformer

342 stars 71 forks source link

Multi-Agent Transformer

This is the official implementation of MAT. MAT is a novel neural network based on the encoder-decoder architecture that implements a multi-agent learning process through sequence models, aiming to build the bridge between MARL and SM so that the modeling power of modern sequence models, the Transformer, can be unleashed for MARL.

For more details, please visit our page site about Muti-Agent Transformer: https://sites.google.com/view/multi-agent-transformer.

In short, MAT:

We present GIFs below to show the architecture and dynamic data flow of MAT.
Architecture of MAT

Installation

Dependences

pip install -r requirements.txt

Multi-agent MuJoCo

Following the instructios in https://github.com/openai/mujoco-py and https://github.com/schroederdewitt/multiagent_mujoco to setup a mujoco environment. In the end, remember to set the following environment variables:

LD_LIBRARY_PATH=${HOME}/.mujoco/mujoco200/bin;
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so

StarCraft II & SMAC

Run the script

bash install_sc2.sh

Or you could install them manually to other path you like, just follow here: https://github.com/oxwhirl/smac.

Google Research Football

Please following the instructios in https://github.com/google-research/football.

Bi-DexHands

Please following the instructios in https://github.com/PKU-MARL/DexterousHands.

How to run

When your environment is ready, you could run shells in the "scripts" folder with algo="mat" or algo="mat_dec". For example:

./train_mujoco.sh  # run MAT/MAT-Dec on Multi-agent MuJoCo

If you would like to change the configs of experiments, you could modify sh files or look for config.py for more details.

Multi-Agent Sequential Decision Paradigm

Conventional multi-agent learning paradigm (left) wherein all agents take actions simultaneously vs. the multi-agent sequential decision paradigm (right) where agents take actions by following a sequential order, each agent accounts for decisions from preceding agents as red arrows suggest.

The key insight of the multi-agent sequential decision paradigm is the multi-agent advantage decomposition theorem (a discovery in HATRPO/HAPPO [ICLR 22, Kuba et.al], indicating the advantage of joint actions could be sequentially divided as shown below.

Performance Comparisons on Cooperative MARL Benchmarks

MAT consistently outperforms its rivals, indicating its modeling capability for homogeneous-agent tasks (agents are interchangeable).

Videos on four super-hard scenarios are shown below.

27m vs 30m MMM2 6h vs 8z 3s5z vs 3s6z

Demonstration and Performance comparison on Multi-Agent Mujoco HalfCheetah and Bimanual Dexterous Hands Manipulation tasks, showing MAT's advantages in robot control for heterogeneous agents (agents are not interchangeable).

Performance comparison on the Google Research Football tasks with 2-4 agents from left to right respectively, telling the same conclusion that MAT outperforms MAPPO and HAPPO.

MAT as Excellent Few-short Learners

Few-shot performance comparison with models pre-trained on complete HalfCheetah. MAT exhibits powerful generalisation capability when parts of the robot fail.

Few-shot performance comparison with pre-trained models on multiple SMAC tasks. Sequence-modeling-based methods, MAT and MAT-Dec, enjoy superior performance over MAPPO, justifying their strong generalisation capability as few-shot learners.

Citation

Please cite as following if you think this work is helpful for you:

@article{wen2022multi,
  title={Multi-Agent Reinforcement Learning is a Sequence Modeling Problem},
  author={Wen, Muning and Kuba, Jakub Grudzien and Lin, Runji and Zhang, Weinan and Wen, Ying and Wang, Jun and Yang, Yaodong},
  journal={arXiv preprint arXiv:2205.14953},
  year={2022}
}