TTomilin / COOM

COOM: Benchmarking Continual Reinforcement Learning on Doom
MIT License
12 stars 0 forks source link

COOM

COOM is a Continual Learning benchmark for embodied pixel-based RL, consisting of task sequences in visually distinct 3D environments with diverse objectives and egocentric perception. COOM is designed for task-incremental learning, in which task boundaries are clearly defined. A short presentation of COOM can be found on SlidesLive and a demo is available on Youtube.

Demo1 Demo2

Installation

To install COOM from PyPi, just run:

$ pip install COOM

Alternatively, to install COOM from source, clone this repo, cd to it, and then:

  1. Clone the repository
    $ git clone https://github.com/hyintell/COOM
  2. Navigate into the repository
    $ cd COOM
  3. Install COOM from source with pip
    $ pip install .

Environments

COOM contains 8 scenarios:

Scenario Success Metric Enemies Weapon Items Max Steps Execute Action Stochasticity Image
Pitfall Distance Covered 1000 JUMP Pitfall tile locations Default
Arms Dealer Weapons Delivered 1000 SPEED Weapon spawn locations, delivery locations Default
Hide and Seek Frames Alive 2500 SPEED Enemy behaviour, item spawn locations Default
Floor is Lava Frames Alive 2500 SPEED Platform locations Default
Chainsaw Kill Count 2500 ATTACK Enemy and agent spawn locations Default
Raise the Roof Frames Alive 2500 USE Agent spawn location Default
Run and Gun Kill Count 2500 ATTACK Enemy and agent spawn locations Default
Health Gathering Frames Alive 2500 SPEED Health kit spawn locations Default

Every scenario except Run and Gun has 2 environments: default and hard. The full list of environment is the following:

Task Sequences for Continual Learning

To formulate a continual learning problem, we compose sequences of tasks, where each task is an environment of a scenario. The agent is trained on each task sequentially, without access to the previous tasks. The agent is continually evaluated on all tasks throughout training. The task sequence is considered solved if the agent achieves maximum success on all tasks. There are three lengths of Continual Learning task sequences in our benchmark: 1) 8-task sequences serve as the core of the benchmark 2) 4-task sequences are comprised of the 2nd half of an 8-task sequence 3) 16-task sequences combine tasks of two 8-task sequences

We further distinguish between the Cross-Domain and Cross-Objective sequences.

Cross-Domain

In the cross-domain setting, the agent is sequentially trained on modified versions of the same scenario. Run and Gun is selected as basis for this CL sequence, since out of the 8 scenarios in the benchmark, it best resembles the actual Doom game, requiring the agent to navigate the map and eliminate enemies by firing a weapon. The objective and the layout of the map remain the same across tasks, whereas we modify the environment in the following ways: 1) Changing the textures of the surrounding walls, ceiling and floor 2) Varying the size, shape and type of enemies 3) Randomizing the view height of the agent, and 4) Adding objects to the environment which act as obstacles, blocking the agent’s movement.

Tasks in the Cross-Domain 8 (CD8) sequence

Default

Cross-Objective

Cross-objective task sequences employ a different scenario with a novel objective for each consecutive task, apart from only changing the visuals and dynamics of a single scenario. This presents a diverse challenge, as the goal might drastically change from locating and eliminating enemies (Run and Gun and Chainsaw) to running away and hiding from them (Hide and Seek). In a similar fashion, the scenario Floor is Lava often requires the agent to remain at a bounded location for optimal performance, whereas scenarios Pitfall, Arms Dealer, Raise the Roof, and Health Gathering endorse constant movement.

Tasks in the Cross-Objective 8 (CO8) sequence

Default

Getting Started

Below we provide a short code snippet to run a sequence with the COOM benchmark.

Basic Usage

Find examples of using COOM environments in the run_single and run_sequence scripts.

Single Environment

from COOM.env.builder import make_env
from COOM.utils.config import Scenario

env = make_env(Scenario.RAISE_THE_ROOF)
env.reset()
for steps in range(1000):
    action = env.action_space.sample()
    state, reward, done, truncated, info = env.step(action)
    env.render()
    if done:
        break
env.close()

Task Sequence

from COOM.env.continual import ContinualLearningEnv
from COOM.utils.config import Sequence

cl_env = ContinualLearningEnv(Sequence.CO8)
for env in cl_env.tasks:
    env.reset()
    done = False
    while not done:
        action = env.action_space.sample()
        state, reward, done, truncated, info = env.step(action)
        env.render()
        if done:
            break
    env.close()

Baseline Results

We have employed various popular continual learning algorithms to evaluate their performance on the COOM benchmark. The algorithms are implemented on top of the Soft-Actor-Critic (SAC) reinforcement learning algorithm. Please follow the instructions in the Continual Learning module to use the algorithms. The following table ranks the baselines from best to worst performing

Method Type Score
PackNet Structure 0.74
ClonEx-SAC Memory 0.73
L2 Regularization 0.64
MAS Regularization 0.56
EWC Regularization 0.54
Fine-Tuning Naïve 0.40
VCL Regularization 0.33
AGEM Memory 0.28
Perfect Memory* Memory 0.89*

*The memory consumption of the method is too high to feasible run it on the longer sequences of the benchmark, so it does not follow the ranking in the table.

Evaluation Metrics

We evaluate the continual learning methods on the COOM benchmark based on Average Performance, Forgetting, and Forward Transfer.

Average Performance

The performance (success rate) averaged over tasks is a typical metric for the continual learning setting. The agent is continually evaluated on all tasks in the sequence even before encountering it. By the end of the sequence, the agent should have mastered all tasks.

Default

Forgetting

Forgetting occurs when the agent's performance on a task decreases after training on a subsequent task. This is a common problem in continual learning, as the agent has to learn new tasks while retaining the knowledge of the previous ones. We measure forgetting by comparing the performance of the agent on a task after training and at the end of the entire sequence. The image below depicts heavy forgetting in the example of AGEM. Default

Contrary to AGEM, ClonEx-SAC is able to retain the knowledge of the previous tasks. Default

Forward Transfer

Transferring learned knowledge from one task to another is a key aspect of continual learning. We measure the forward transfer of the continual learning methods by how efficiently they train on each given task compared to the Soft Actor-Critic (SAC) baseline, which is trained directly on the same from scratch. The red areas between the curves represent negative forward transfer and other colors represent positive forward transfer as depicted on the image below.

Default

Reproducing results

For reproducing the results in our paper please follow the instructions in the results module.

Acknowledgements

COOM is based on the ViZDoom platform.
The Cross-Domain task sequences and the run_and_gun scenario environment modification were inspired by the LevDoom generalization benchmark.
The base implementations of SAC and continual learning methods originate from Continual World.
Our experiments were managed using WandB.

Citation

If you use our work in your research, please cite it as follows:

@inproceedings{tomilin2023coom,
    title={COOM: A Game Benchmark for Continual Reinforcement Learning},
    author={Tomilin, Tristan and Fang, Meng and Zhang, Yudi and Pechenizkiy, Mykola},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2023}
}