Farama-Foundation / MicroRTS-Py

A simple and highly efficient RTS-game-inspired environment for reinforcement learning (formerly Gym-MicroRTS)
MIT License
234 stars 45 forks source link

Formerly Gym-μRTS/Gym-MicroRTS

This repo contains the source code for the gym wrapper of μRTS authored by Santiago Ontañón.

MicroRTS-Py will eventually be updated, maintained, and made compliant with the standards of the Farama Foundation (https://farama.org/project_standards). However, this is currently a lower priority than other projects we're working to maintain. If you'd like to contribute to development, you can join our discord server here- https://discord.gg/jfERDCSw.

demo.gif

Get Started

Prerequisites:

$ git clone --recursive https://github.com/Farama-Foundation/MicroRTS-Py.git && \
cd MicroRTS-Py
poetry install
# The `poetry install` command above creates a virtual environment for us, in which all the dependencies are installed.
# We can use `poetry shell` to create a new shell in which this environment is activated. Once we are done working with
# MicroRTS, we can leave it again using `exit`.
poetry shell
# By default, the torch wheel is built with CUDA 10.2. If you are using newer NVIDIA GPUs (e.g., 3060 TI), you may need to specifically install CUDA 11.3 wheels by overriding the torch dependency with pip:
# poetry run pip install "torch==1.12.1" --upgrade --extra-index-url https://download.pytorch.org/whl/cu113
python hello_world.py

If the poetry install command gets stuck on a Linux machine, it may help to first run: export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring.

To train an agent, run the following

cd experiments
python ppo_gridnet.py \
    --total-timesteps 100000000 \
    --capture-video \
    --seed 1

asciicast

For running a partial observable example, tune the partial_obs argument.

cd experiments
python ppo_gridnet.py \
    --partial-obs \
    --capture-video \
    --seed 1

Technical Paper

Before diving into the code, we highly recommend reading the preprint of our paper: Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games.

Depreciation notes

  1. Note that the experiments in the technical paper above are done with gym_microrts==0.3.2. As we move forward beyond v0.4.x, we are planning to deprecate UAS despite its better performance in the paper. This is because UAS has a more complex implementation and makes it really difficult to incorporate selfplay or imitation learning in the future.
  2. v0.6.1 is the last version in which wall/terrain observations were not present in state tensors. As of December 2023, every state observation has an extra channel encoding the presence of walls, and models trained before this will therefore no longer be compatible with code in the master branch. Such models should use the code from v0.6.1 instead.

Environment Specification

Here is a description of Gym-μRTS's observation and action space:

Here are tables summarizing observation features and action components, where $a_r=7$ is the maximum attack range, and - means not applicable.

Observation Features Planes Description
Hit Points 5 0, 1, 2, 3, $\geq 4$
Resources 5 0, 1, 2, 3, $\geq 4$
Owner 3 -,player 1, player 2
Unit Types 8 -, resource, base, barrack, worker, light, heavy, ranged
Current Action 6 -, move, harvest, return, produce, attack
Terrain 2 free, wall
Action Components Range Description
Source Unit $[0,h \times w-1]$ the location of the unit selected to perform an action
Action Type $[0,5]$ NOOP, move, harvest, return, produce, attack
Move Parameter $[0,3]$ north, east, south, west
Harvest Parameter $[0,3]$ north, east, south, west
Return Parameter $[0,3]$ north, east, south, west
Produce Direction Parameter $[0,3]$ north, east, south, west
Produce Type Parameter $[0,6]$ resource, base, barrack, worker, light, heavy, ranged
Relative Attack Position $[0,a_r^2 - 1]$ the relative location of the unit that will be attacked

Evaluation

You can evaluate trained agents against a built-in bot:

cd experiments
python ppo_gridnet_eval.py \
    --agent-model-path gym-microrts-static-files/agent_sota.pt \
    --ai coacAI

Alternatively, you can evaluate the trained RL bots against themselves

cd experiments
python ppo_gridnet_eval.py \
    --agent-model-path gym-microrts-static-files/agent_sota.pt \
    --agent2-model-path gym-microrts-static-files/agent_sota.pt

Evaluate Trueskill of the agents

This repository already contains a preset Trueskill database in experiments/league.db. To evaluate a new AI, try running the following command, which will iteratively find good matches for agent.pt until the engine is confident agent.pt's Trueskill (by having the agent's Trueskill sigma below --highest-sigma 1.4).

cd experiments
python league.py --evals gym-microrts-static-files/agent_sota.pt --highest-sigma 1.4 --update-db False

To recreate the preset Trueskill database, start a round-robin Trueskill evaluation among built-in AIs by removing the database in experiments/league.db.

cd experiments
rm league.csv league.db
python league.py --evals randomBiasedAI workerRushAI lightRushAI coacAI

Multi-maps support

The training script allows you to train the agents with more than one maps and evaluate with more than one maps. Try executing:

cd experiments
python ppo_gridnet.py \
    --train-maps maps/16x16/basesWorkers16x16B.xml maps/16x16/basesWorkers16x16C.xml maps/16x16/basesWorkers16x16D.xml maps/16x16/basesWorkers16x16E.xml maps/16x16/basesWorkers16x16F.xml \
    --eval-maps maps/16x16/basesWorkers16x16B.xml maps/16x16/basesWorkers16x16C.xml maps/16x16/basesWorkers16x16D.xml maps/16x16/basesWorkers16x16E.xml maps/16x16/basesWorkers16x16F.xml

where --train-maps allows you to specify the training maps and --eval-maps the evaluation maps. --train-maps and --eval-maps do not have to match (so you can evaluate on maps the agent has never trained on before).

Known issues

[ ] Rendering does not exactly work in macos. See https://github.com/jpype-project/jpype/issues/906

Papers written using Gym-μRTS

PettingZoo API

We wrapped our Gym-µRTS simulator into a PettingZoo environment, which is defined in gym_microrts/pettingzoo_api.py. An example usage of the Gym-µRTS PettingZoo environment can be found in hello_world_pettingzoo.py.

Cite this project

To cite the Gym-µRTS simulator:

@inproceedings{huang2021gym,
  author    = {Shengyi Huang and
               Santiago Onta{\~{n}}{\'{o}}n and
               Chris Bamford and
               Lukasz Grela},
  title     = {Gym-{\(\mathrm{\mu}\)}RTS: Toward Affordable Full Game Real-time Strategy
               Games Research with Deep Reinforcement Learning},
  booktitle = {2021 {IEEE} Conference on Games (CoG), Copenhagen, Denmark, August
               17-20, 2021},
  pages     = {671--678},
  publisher = {{IEEE}},
  year      = {2021},
  url       = {https://doi.org/10.1109/CoG52621.2021.9619076},
  doi       = {10.1109/CoG52621.2021.9619076},
  timestamp = {Fri, 10 Dec 2021 10:41:01 +0100},
  biburl    = {https://dblp.org/rec/conf/cig/HuangO0G21.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

To cite the invalid action masking technique used in our training script:

@inproceedings{huang2020closer,
  author    = {Shengyi Huang and
               Santiago Onta{\~{n}}{\'{o}}n},
  editor    = {Roman Bart{\'{a}}k and
               Fazel Keshtkar and
               Michael Franklin},
  title     = {A Closer Look at Invalid Action Masking in Policy Gradient Algorithms},
  booktitle = {Proceedings of the Thirty-Fifth International Florida Artificial Intelligence
               Research Society Conference, {FLAIRS} 2022, Hutchinson Island, Jensen
               Beach, Florida, USA, May 15-18, 2022},
  year      = {2022},
  url       = {https://doi.org/10.32473/flairs.v35i.130584},
  doi       = {10.32473/flairs.v35i.130584},
  timestamp = {Thu, 09 Jun 2022 16:44:11 +0200},
  biburl    = {https://dblp.org/rec/conf/flairs/HuangO22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}