IDSIA / hhmarl_2D

Heterogeneous Hierarchical Multi Agent Reinforcement Learning for Air Combat
63 stars 13 forks source link
air-combat heterogeneous-agents hierarchical-reinforcement-learning multi-agent-reinforcement-learning

HHMARL 2D

Heterogeneous Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering, the implementation of the method proposed in this paper.

Overview

We use low-level policies for either fight or escape maneuvers. These will be first trained, then employed in the high-level hierarchy as part of environment.

Requiered Packages

Training

Run train_hetero.py for heterogeneous agents training in low-level mode and train_hier.py to train the high-level policy (commander). The low-level policies must be pre-trained and stored in order to start training of the commander policy. At this stage, low-level policy training is configured for 2vs2 and high-level policy training for 3vs3. The reason for this is the structure of Ray for setting up Centralized Critics. But evaluations can be done in any combat configuration.

Procedure

For training the full model, proceed as follows:

1) Run train_hetero.py

2) Run train_hier.py

3) Run evaluation.py

Curriculum Learning

Configurations

Most important arguments to set are the following. All arguments can be found in config.py.

Inference

Levels 4 and 5 use the previously learned policies (fictitious self-play). Ray seems inconsistent when calling its method Policy.compute_single_action(). Therefore, the learned policies will be stored during training in folder policies from level 3 onwards. The actions will then be computed manually inside the method _policy_actions(). You can also manually export policies by running policy_export.py (have a look at it and make configurations as you want).

Commander Sensing

Change N_OPPS_HL in env_hier.py, train_hier.py and ac_models_hier.py to change detected opponents (N2-vs-N3 in the paper). E.g. setting N_OPPS_HL=3 allows the Commander to detect 3 opponents for an agent and can select one of these three to attack.

GPU vs CPU

Ray allows training on GPU but during several experiments, the performance was worse compared to CPU. Reason still unknown. This might improve in future versions. In our case, GPU was an RTX 3080Ti and CPU i9-13900H.

Note

HHMARL 3D is on its way with more advanced rendering ...

Citation

@misc{hhmarl2d,
  author = {Ardian Selmonaj and Oleg Szehr and Giacomo Del Rio and Alessandro Antonucci and Adrian Schneider and Michael Rüegsegger},
  title = {Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering},
  year = {2023},
  eprint = {arXiv:2309.11247},
}