MATE: the Multi-Agent Tracking Environment

This repo contains the source code of MATE, the Multi-Agent Tracking Environment. The full documentation can be found at https://mate-gym.readthedocs.io. The full list of implemented agents can be found in section Implemented Algorithms. For detailed description, please checkout our paper (PDF, bibtex).

This is an asymmetric two-team zero-sum stochastic game with partial observations, and each team has multiple agents (multiplayer). Intra-team communications are allowed, but inter-team communications are prohibited. It is cooperative among teammates, but it is competitive among teams (opponents).

Installation

git config --global core.symlinks true  # required on Windows
pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate

NOTE: Python 3.7+ is required, and Python versions lower than 3.7 is not supported.

It is highly recommended to create a new isolated virtual environment for MATE using conda:

git clone https://github.com/XuehaiPan/mate.git && cd mate
conda env create --no-default-packages --file conda-recipes/basic.yaml  # or full-cpu.yaml to install RLlib
conda activate mate

Getting Started

Make the MultiAgentTracking environment and play!

import mate

# Base environment for MultiAgentTracking
env = mate.make('MultiAgentTracking-v0')
env.seed(0)
done = False
camera_joint_observation, target_joint_observation = env.reset()
while not done:
    camera_joint_action, target_joint_action = env.action_space.sample()  # your agent here (this takes random actions)
    (
        (camera_joint_observation, target_joint_observation),
        (camera_team_reward, target_team_reward),
        done,
        (camera_infos, target_infos)
    ) = env.step((camera_joint_action, target_joint_action))

Another example with a built-in single-team wrapper (see also Built-in Wrappers):

import mate

env = mate.make('MultiAgentTracking-v0')
env = mate.MultiTarget(env, camera_agent=mate.GreedyCameraAgent(seed=0))
env.seed(0)
done = False
target_joint_observation = env.reset()
while not done:
    target_joint_action = env.action_space.sample()  # your agent here (this takes random actions)
    target_joint_observation, target_team_reward, done, target_infos = env.step(target_joint_action)

Screencast
4 Cameras vs. 8 Targets (9 Obstacles)

Examples and Demos

mate/evaluate.py contains the example evaluation code for the MultiAgentTracking environment. Try out the following demos:

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 2 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-4v2-9.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-4v8-9.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(8 cameras, 8 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-8v8-9.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 0 obstacle)
python3 -m mate.evaluate --episodes 1 --config MATE-4v8-0.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(0 camera, 8 targets, 32 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-Navigation.yaml

4 Cameras vs. 2 Targets (9 obstacles)	4 Cameras vs. 8 Targets (9 obstacles)	8 Cameras vs. 8 Targets (9 obstacles)	4 Cameras vs. 8 Targets (no obstacles)	8 Targets Navigation (no cameras)

You can specify the agent classes and arguments by:

python3 -m mate.evaluate --camera-agent module:class --camera-kwargs <JSON-STRING> --target-agent module:class --target-kwargs <JSON-STRING>

You can find the example code for agents in examples. The full list of implemented agents can be found in section Implemented Algorithms. For example:

# Example demos in examples
python3 -m examples.naive

# Use the evaluation script
python3 -m mate.evaluate --episodes 1 --render-communication \
    --camera-agent examples.greedy:GreedyCameraAgent --camera-kwargs '{"memory_period": 20}' \
    --target-agent examples.greedy:GreedyTargetAgent \
    --config MATE-4v8-9.yaml \
    --seed 0

Communication

You can implement your own custom agents classes to play around. See Make Your Own Agents for more details.

Environment Configurations

The MultiAgentTracking environment accepts a Python dictionary mapping or a configuration file in JSON or YAML format. If you want to use customized environment configurations, you can copy the default configuration file:

cp "$(python3 -m mate.assets)"/MATE-4v8-9.yaml MyEnvCfg.yaml

Then make some modifications for your own. Use the modified environment by:

env = mate.make('MultiAgentTracking-v0', config='/path/to/your/cfg/file')

There are several preset configuration files in mate/assets directory.

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 2 targets, 9 obstacles)
env = mate.make('MATE-4v2-9-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 8 targets, 9 obstacles)
env = mate.make('MATE-4v8-9-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(8 camera, 8 targets, 9 obstacles)
env = mate.make('MATE-8v8-9-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 8 targets, 0 obstacles)
env = mate.make('MATE-4v8-0-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(0 camera, 8 targets, 32 obstacles)
env = mate.make('MATE-Navigation-v0')

You can reinitialize the environment with a new configuration without creating a new instance:

>>> env = mate.make('MultiAgentTracking-v0', wrappers=[mate.MoreTrainingInformation])  # we support wrappers
>>> print(env)
<MoreTrainingInformation<MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 9 obstacles)>

>>> env.load_config('MATE-8v8-9.yaml')
>>> print(env)
<MoreTrainingInformation<MultiAgentTracking<MultiAgentTracking-v0>>(8 cameras, 8 targets, 9 obstacles)>

Besides, we provide a script mate/assets/generator.py to generate a configuration file with responsible camera placement:

python3 -m mate.assets.generator --path 24v48.yaml --num-cameras 24 --num-targets 48 --num-obstacles 20

See Environment Customization for more details.

Built-in Wrappers

MATE provides multiple wrappers for different settings. Such as fully observability, discrete action spaces, single team multi-agent, etc. See Built-in Wrappers for more details.

Wrapper		Description
observation	`EnhancedObservation`	Enhance the agent’s observation, which sets all observation mask to `True`.
	`SharedFieldOfView`	Share field of view among agents in the same team, which applies the `or` operator over the observation masks. The target agents share the empty status of warehouses.
	`MoreTrainingInformation`	Add more environment and agent information to the `info` field of `step()`, enabling full observability of the environment.
	`RescaledObservation`	Rescale all entity states in the observation to [-1, +1].
	`RelativeCoordinates`	Convert all locations of other entities in the observation to relative coordinates.
action	`DiscreteCamera`	Allow cameras to use discrete actions.
action	`DiscreteTarget`	Allow targets to use discrete actions.
reward	`AuxiliaryCameraRewards`	Add additional auxiliary rewards for each individual camera.
reward	`AuxiliaryTargetRewards`	Add additional auxiliary rewards for each individual target.
single-team	`MultiCamera`	Wrap into a single-team multi-agent environment.
	`MultiTarget`	Wrap into a single-team multi-agent environment.
	`SingleCamera`	Wrap into a single-team single-agent environment.
	`SingleTarget`	Wrap into a single-team single-agent environment.
communication	`MessageFilter`	Filter messages from agents of intra-team communications.
	`RandomMessageDropout`	Randomly drop messages in communication channels.
	`RestrictedCommunicationRange`	Add a restricted communication range to channels.
	`NoCommunication`	Disable intra-team communications, i.e., filter out all messages.
	`ExtraCommunicationDelays`	Add extra message delays to communication channels.
miscellaneous	`RepeatedRewardIndividualDone`	Repeat the `reward` field and assign individual `done` field of `step()`, which is similar to MPE.

You can create an environment with multiple wrappers at once. For example:

env = mate.make('MultiAgentTracking-v0',
                wrappers=[
                    mate.EnhancedObservation,
                    mate.MoreTrainingInformation,
                    mate.WrapperSpec(mate.DiscreteCamera, levels=5),
                    mate.WrapperSpec(mate.MultiCamera, target_agent=mate.GreedyTargetAgent(seed=0)),
                    mate.RepeatedRewardIndividualDone,
                    mate.WrapperSpec(mate.AuxiliaryCameraRewards,
                                     coefficients={'raw_reward': 1.0,
                                                   'coverage_rate': 1.0,
                                                   'soft_coverage_score': 1.0,
                                                   'baseline': -2.0}),
                ])

Implemented Algorithms

The following algorithms are implemented in examples:

Rule-based:
1. Random (source: mate/agents/random.py)
2. Naive (source: mate/agents/naive.py)
3. Greedy (source: mate/agents/greedy.py)
4. Heuristic (source: mate/agents/heuristic.py)
Multi-Agent Reinforcement Learning Algorithms:
1. IQL (https://arxiv.org/abs/1511.08779)
2. QMIX (https://arxiv.org/abs/1803.11485)
3. MADDPG (MA-TD3) (https://arxiv.org/abs/1706.02275)
4. IPPO (https://arxiv.org/abs/2011.09533)
5. MAPPO (https://arxiv.org/abs/2103.01955)
Multi-Agent Reinforcement Learning Algorithms with Multi-Agent Communication:
1. TarMAC (base algorithm: IPPO) (https://arxiv.org/abs/1810.11187)
2. TarMAC (base algorithm: MAPPO)
3. I2C (base algorithm: MAPPO) (https://arxiv.org/abs/2006.06455)
Population Based Adversarial Policy Learning, available meta-solvers:
1. Self-Play (SP)
2. Fictitious Self-Play (FSP) (https://proceedings.mlr.press/v37/heinrich15.html)
3. PSRO-Nash (NE) (https://arxiv.org/abs/1711.00832)

NOTE: all learning-based algorithms are tested with Ray 1.12.0 on Ubuntu 20.04 LTS.

Citation

If you find MATE useful, please consider citing:

@inproceedings{pan2022mate,
  title     = {{MATE}: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control},
  author    = {Xuehai Pan and Mickel Liu and Fangwei Zhong and Yaodong Yang and Song-Chun Zhu and Yizhou Wang},
  booktitle = {Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year      = {2022},
  url       = {https://openreview.net/forum?id=SyoUVEyzJbE}
}

License

MIT License

XuehaiPan / mate

readme