This codebase provides the open source implementation using the Dopamine framework for running Atari experiments in Reincarnating RL. In this work, we leverage the policy from an existing agent (e.g., DQN trained for 400M environment frames) to reincarnate another deep Q-learning agent. Refer to agarwl.github.io/reincarnating_rl for the project page.
This release is a work-in-progress. More instructions to be added soon.
The teacher checkpoints for pre-trained deep RL agents are in the public GCP
bucket gs://rl_checkpoints
(browser link) which can be
downloaded using gsutil
. To install gsutil, follow the instructions
here.
After installing gsutil, run the command to download the final checkpoint and Dopamine replay buffer for a DQN (Adam) agent trained for 400 million environment frames on Atari 2600 games:
gsutil -m cp -R gs://rl_checkpoints/DQN_400 ./
To run the dataset only for a specific Atari game (e.g., replace GAME_NAME
by Breakout
to download the checkpoint for the game of Breakout), run the
command:
gsutil -m cp -R gs://rl_checkpoints/DQN_400/[GAME_NAME] ./
Note that the agents were trained using recommended training protocol on Atari with sticky actions, i.e., there is 25% chance at every time step that the environment will execute the agent's previous action again, instead of the agent's new action.
Install Dopamine
as a library following the
instructions here.
Alternative, use the following command:
pip install git+https://github.com/google/dopamine.git
For using Atari environments, follow the instructions provided in Dopamine prerequisites.
pip install ale-py
(we recommend using a
virtual environment):unzip $ROM_DIR/ROMS.zip -d $ROM_DIR && ale-import-roms $ROM_DIR/ROMS
(replace $ROM_DIR with the directory you extracted the ROMs to).Once you have setup Dopamine
, clone this repository:
git clone https://github.com/google-research/reincarnating_rl.git
The entry point for training policy to value reincarnating RL (PVRL) agents on Atari 2600 games is reincarnating_rl/train.py.
To run any PVRL agent given a teacher agent, we need to first download the
teacher checkpoints to $TEACHER_CKPT_DIR
. To do so, we download the
checkpoints of a DQN (Adam) trained for 400M frames on Breakout
.
export TEACHER_CKPT_DIR="<Insert directory name here>"
mkdir -p $TEACHER_CKPT_DIR/Breakout
gsutil -m cp -R gs://rl_checkpoints/DQN_400/Breakout $TEACHER_CKPT_DIR
Assuming that you have cloned the reincarnating_rl repository, run the
QDaggerRainbow
agent using the following command:
cd reincarnating_rl
python -um reincarnating_rl.train \
--agent qdagger_rainbow \
--gin_files reincarnating_rl/configs/qdagger_rainbow.gin
--base_dir /tmp/qdagger_rainbow \
--teacher_checkpoint_dir $TEACHER_CKPT_DIR/Breakout/1 \
--teacher_checkpoint_number 399
--run_number=1 \
--atari_roms_path=/tmp/atari_roms \
--alsologtostderr
To use a Impala CNN
architecture for the rainbow agent, pass the flag
--gin_bindings @reincarnation_networks.ImpalaRainbowNetwork
to the above
command. More generally, since this code is based on Dopamine, it can be easily
configured using the gin configuration
framework.
To run a quick experiment run for testing / debugging, you can use the following command:
python -um reincarnating_rl.train \
--agent qdagger_rainbow \
--gin_files reincarnating_rl/configs/qdagger_rainbow.gin \
--base_dir /tmp/qdagger_rainbow \
--teacher_checkpoint_dir $TEACHER_CKPT_DIR/Breakout/1 \
--teacher_checkpoint_number 399 \
--atari_roms_path=/tmp/atari_roms \
--run_number=1 \
--gin_bindings="Runner.evaluation_steps=10" \
--gin_bindings="RunnerWithTeacher.num_pretraining_iterations=2" \
--gin_bindings="RunnerWithTeacher.num_pretraining_steps=10" \
--gin_bindings="JaxDQNAgent.min_replay_history = 64" \
--alsologtostderr
If you find this open source release useful, please reference in your paper:
Agarwal, R., Schwarzer, M., Castro, P. S., Courville, A., & Bellemare, M. G. (2022). Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress arXiv preprint arXiv:2206.01626.
@inproceedings{agarwal2022beyond,
title={Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress},
author={Agarwal, Rishabh and Schwarzer, Max and Castro, Pablo Samuel and Courville, Aaron and Bellemare, Marc G},
booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
year={2022}
}
Disclaimer: This is not an official Google product.