COOM is a Continual Learning benchmark for embodied pixel-based RL, consisting of task sequences in visually distinct 3D environments with diverse objectives and egocentric perception. COOM is designed for task-incremental learning, in which task boundaries are clearly defined. A short presentation of COOM can be found on SlidesLive and a demo is available on Youtube.
To install COOM from PyPi, just run:
$ pip install COOM
Alternatively, to install COOM from source, clone this repo, cd to it, and then:
$ git clone https://github.com/hyintell/COOM
$ cd COOM
$ pip install .
COOM contains 8 scenarios:
Scenario | Success Metric | Enemies | Weapon | Items | Max Steps | Execute Action | Stochasticity | Image |
---|---|---|---|---|---|---|---|---|
Pitfall | Distance Covered | ✗ | ✗ | ✗ | 1000 | JUMP | Pitfall tile locations | |
Arms Dealer | Weapons Delivered | ✗ | ✓ | ✓ | 1000 | SPEED | Weapon spawn locations, delivery locations | |
Hide and Seek | Frames Alive | ✓ | ✗ | ✓ | 2500 | SPEED | Enemy behaviour, item spawn locations | |
Floor is Lava | Frames Alive | ✗ | ✗ | ✗ | 2500 | SPEED | Platform locations | |
Chainsaw | Kill Count | ✓ | ✓ | ✗ | 2500 | ATTACK | Enemy and agent spawn locations | |
Raise the Roof | Frames Alive | ✗ | ✗ | ✗ | 2500 | USE | Agent spawn location | |
Run and Gun | Kill Count | ✓ | ✓ | ✗ | 2500 | ATTACK | Enemy and agent spawn locations | |
Health Gathering | Frames Alive | ✗ | ✗ | ✓ | 2500 | SPEED | Health kit spawn locations |
Every scenario except Run and Gun
has 2 environments: default
and hard
. The full list of environment is the following:
pitfall-default-v0
- traverse a tunnel as far as possible without falling into a pitpitfall-hard-v0
- the agent has reduced movement speed and there are more pits in the surfacearms_dealer-default-v0
- collect and deliver weapons to the marked locations as fast as possiblearms_dealer-hard-v0
- the map is larger and there are fewer weapons to collecthide_and_seek-default-v0
- escape and hide from enemies as long as possiblehide_and_seek-hard-v0
- there are more enemies and they are fasterfloor_is_lava-default-v0
- keep off the laval by navigating to new platforms as they reappear at new locationsfloor_is_lava-hard-v0
- there are fewer platforms to stand on and their locations are in more rapid changechainsaw-default-v0
- seek out and melee as many enemies as possiblechainsaw-hard-v0
- there are fewer enemies to find, and they are tougher to eliminateraise_the_roof-default-v0
- locate and press switches on the walls to raise the ceiling and avoid getting crushedraise_the_roof-hard-v0
- the ceiling is lowered faster and the switches are harder to spotrun_and_gun-default-v0
- use a pistol to eliminate as many enemies as possiblerun_and_gun-hard-v0
- there are fewer enemies to find, and they are tougher to eliminaterun_and_gun-obstacles-v0
- there are obstacles in the environment which block the agent's movementrun_and_gun-green-v0
- the wall, ceiling and floor textures are greenrun_and_gun-resized-v0
- the agent's view height is randomized and the targets are randomly resizedrun_and_gun-monsters-v0
- the targets are replaced with monsters that move around and fight backrun_and_gun-red-v0
- the wall, ceiling and floor textures are redrun_and_gun-blue-v0
- the wall, ceiling and floor textures are bluerun_and_gun-shadows-v0
- the targets are less visible due to lowered opacityhealth_gathering-default-v0
- collect health kits as fast as possible to stay alivehealth_gathering-hard-v0
- there are fewer health kits to find and the environment layout is more complexTo formulate a continual learning problem, we compose sequences of tasks, where each task is an environment of a scenario. The agent is trained on each task sequentially, without access to the previous tasks. The agent is continually evaluated on all tasks throughout training. The task sequence is considered solved if the agent achieves maximum success on all tasks. There are three lengths of Continual Learning task sequences in our benchmark: 1) 8-task sequences serve as the core of the benchmark 2) 4-task sequences are comprised of the 2nd half of an 8-task sequence 3) 16-task sequences combine tasks of two 8-task sequences
We further distinguish between the Cross-Domain
and Cross-Objective
sequences.
In the cross-domain setting, the agent is sequentially trained on modified versions of the same scenario.
Run and Gun
is selected as basis for this CL sequence, since out of the 8 scenarios in the benchmark, it best resembles
the actual Doom game, requiring the agent to navigate the map and eliminate enemies by firing a weapon. The objective and
the layout of the map remain the same across tasks, whereas we modify the environment in the following ways:
1) Changing the textures of the surrounding walls, ceiling and floor
2) Varying the size, shape and type of enemies
3) Randomizing the view height of the agent, and
4) Adding objects to the environment which act as obstacles, blocking the agent’s movement.
Cross-objective task sequences employ a different scenario with a novel objective for each consecutive task, apart from
only changing the visuals and dynamics of a single scenario. This presents a diverse challenge, as the goal might
drastically change from locating and eliminating enemies (Run and Gun
and Chainsaw
) to running away and hiding
from them (Hide and Seek
). In a similar fashion, the scenario Floor is Lava
often requires the agent to remain at a
bounded location for optimal performance, whereas scenarios Pitfall
, Arms Dealer
, Raise the Roof
, and Health Gathering
endorse constant movement.
Below we provide a short code snippet to run a sequence with the COOM benchmark.
Find examples of using COOM environments in the run_single and run_sequence scripts.
from COOM.env.builder import make_env
from COOM.utils.config import Scenario
env = make_env(Scenario.RAISE_THE_ROOF)
env.reset()
for steps in range(1000):
action = env.action_space.sample()
state, reward, done, truncated, info = env.step(action)
env.render()
if done:
break
env.close()
from COOM.env.continual import ContinualLearningEnv
from COOM.utils.config import Sequence
cl_env = ContinualLearningEnv(Sequence.CO8)
for env in cl_env.tasks:
env.reset()
done = False
while not done:
action = env.action_space.sample()
state, reward, done, truncated, info = env.step(action)
env.render()
if done:
break
env.close()
We have employed various popular continual learning algorithms to evaluate their performance on the COOM benchmark. The algorithms are implemented on top of the Soft-Actor-Critic (SAC) reinforcement learning algorithm. Please follow the instructions in the Continual Learning module to use the algorithms. The following table ranks the baselines from best to worst performing
Method | Type | Score |
---|---|---|
PackNet | Structure | 0.74 |
ClonEx-SAC | Memory | 0.73 |
L2 | Regularization | 0.64 |
MAS | Regularization | 0.56 |
EWC | Regularization | 0.54 |
Fine-Tuning | Naïve | 0.40 |
VCL | Regularization | 0.33 |
AGEM | Memory | 0.28 |
Perfect Memory* | Memory | 0.89* |
*The memory consumption of the method is too high to feasible run it on the longer sequences of the benchmark, so it does not follow the ranking in the table.
We evaluate the continual learning methods on the COOM benchmark based on Average Performance, Forgetting, and Forward Transfer.
The performance (success rate) averaged over tasks is a typical metric for the continual learning setting. The agent is continually evaluated on all tasks in the sequence even before encountering it. By the end of the sequence, the agent should have mastered all tasks.
Forgetting occurs when the agent's performance on a task decreases after training on a subsequent task. This is a common problem in continual learning, as the agent has to learn new tasks while retaining the knowledge of the previous ones. We measure forgetting by comparing the performance of the agent on a task after training and at the end of the entire sequence. The image below depicts heavy forgetting in the example of AGEM.
Contrary to AGEM, ClonEx-SAC is able to retain the knowledge of the previous tasks.
Transferring learned knowledge from one task to another is a key aspect of continual learning. We measure the forward transfer of the continual learning methods by how efficiently they train on each given task compared to the Soft Actor-Critic (SAC) baseline, which is trained directly on the same from scratch. The red areas between the curves represent negative forward transfer and other colors represent positive forward transfer as depicted on the image below.
For reproducing the results in our paper please follow the instructions in the results module.
COOM is based on the ViZDoom platform.
The Cross-Domain
task sequences and the run_and_gun
scenario environment modification were inspired by the LevDoom generalization benchmark.
The base implementations of SAC and continual learning methods originate from Continual World.
Our experiments were managed using WandB.
If you use our work in your research, please cite it as follows:
@inproceedings{tomilin2023coom,
title={COOM: A Game Benchmark for Continual Reinforcement Learning},
author={Tomilin, Tristan and Fang, Meng and Zhang, Yudi and Pechenizkiy, Mykola},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023}
}