Improbable-AI / curiosity_baselines

An open source reinforcement learning codebase with a variety of intrinsic exploration methods implemented in PyTorch.
MIT License
10 stars 4 forks source link
exploration-strategy reinforcement-learning

Overview

This is a collection of curiosity algorithms implemented in pytorch on top of the rlpyt deep rl codebase.

To-do

1) Add remaining curiosity models 2) Update models directory with more environments

Available Learning Algorithms

Policy Gradient A2C, PPO

Replay Buffers (supporting both DQN + QPG) non-sequence and sequence (for recurrent) replay, n-step returns, uniform or prioritized replay, full-observation or frame-based buffer (e.g. for Atari, stores only unique frames to save memory, reconstructs multi-frame observations).

Deep Q-Learning DQN + variants: Double, Dueling, Categorical (up to Rainbow minus Noisy Nets), Recurrent (R2D2-style)

Q-Function Policy Gradient DDPG, TD3, SAC

Available Curiosity Algorithms

Prediction error ICM, Disagreement

Count-based RND

Learning progress NDIGO

Available Environments

Usage

  1. Clone this repo.

  2. If you plan on using mujoco, place your license key "mjkey.txt" in the base directory. This file will be copied in when you start docker using the Makefile command.

  3. Make sure you have docker installed to run the image. We recommend running the GPU image which will work even if you are only using CPUs (labeled version_gpu), but a CPU only image is provided as well.

  4. Edit global.json to customize any volume mount points, port forwarding, and docker image versions from the registry. Information from this file is read into the Makefile.

  5. The makefile contains some basic commands (we use node to read in information from global.json at the top - it's not used for anything else).

    make start_docker # start the docker container and drop you in a shell
    make start_docker_gpu # start the docker container if running on a machine with GPUs
    make stop_docker # stop the docker container and clean up
    make clean # clean all subdirectories of pycache files etc.
  6. Before running anything, make sure you create an empty directory titled "results" in the base directory.

  7. Run the launch file from the command line, substituting in your preferences for the correct arguments (see rlpyt/utils/launching/arguments.py for a complete list).

    python3 launch.py -env breakout -alg ppo -curiosity_alg icm -lstm
  8. This will launch your experiment in a tmux session titled "experiment". This session will have 3 windows - a window where your code is running, an htop monitoring process, and a window that serves tensorboard to port 12345 (or the port specified in global.json).

  9. Results folders will be automatically generated in the results directory created in step 6.

  10. Example runs can be found in the models directory. Model weights and exact hyperparameters can be found there for tested environments.

Notes

For more information on the rlpyt core codebase, please see this white paper on Arxiv. If you use this repository in your work or otherwise wish to cite it, please make reference to the white paper.

Code Organization

The class types perform the following roles:

Sources and Acknowledgements

This codebase is currently funded by Amazon MLRA - we thank them for their support.

Parts of the following open source codebases were used to make this codebase possible. Thanks to all of them for their amazing work!

Thanks to Prof. Pulkit Agrawal and the members of the Improbable AI lab at MIT CSAIL for their continued guidance and support.