AOS55 / url-suite

Unsupervised Reinforcement Learning Suite
MIT License
4 stars 0 forks source link

The Unsupervised Reinforcement Learning Suite (URLS)

URLS aims to provide a set of unsupervised reinforcement learning algorithms and experiments for the purpose of researching the applicability of unsupervised reinforcement learning to a variety of paradigms.

The codebase is based upon URLB and ExORL. Further details are provided in the following papers:

URLS is intended as a successor to URLB allowing for an increased number of experiments and RL paradigms.

Prerequisites

Install MuJoCo if it is not already the case:

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate urls-env

Workflow

We provide the following workflows:

Unsupervised Reinforcement Learning

Pre-training, learn from agents intrinsic reward on a specific domain

  python pretrain.py agent=UNSUPERVISED_AGENT domain=DOMAIN

Fine-tuning, learn with the pre-trained agent on a specific, task specific reward is now used for the agent

  python finetune.py pretrained_agent=UNSUPERVISED_AGENT task=TASK snapshot_ts=TS obs_type=OBS_TYPE

Offline Learning from Unsupervised Reinforcement Learning

Pre-training, learn from agents intrinsic reward on a specific domain

  python pretrain.py agent=UNSUPERVISED_AGENT domain=DOMAIN

Sampling, sample demos from agent replay buffer on a specific task

  python sampling.py agent=UNSUPERVISED_AGENT task=TASK samples=SAMPLES snapshot_ts=TS obs_type=OBS_TYPE

Offline-learning, learn a policy using the offline data collected on the specific task.

  python train_offline.py agent=OFFLINE_AGENT expl_agent=UNSUPERVISED_AGENT task=TASK

Safe Reinforcement Learning

Pre-training, learn from agents intrinsic reward on a specific domain

  python pretrain.py agent=UNSUPERVISED_AGENT domain=DOMAIN

Sampling, sample demos from agent replay buffer with constraints and images

  python sampling.py agent=UNSUPERVISED_AGENT task=TASK samples=SAMPLES snapshot_ts=TS obs_type=OBS_TYPE

Trajectories to Images, create image dataset from trajectories

  python data_to_images.py --env=DOMAIN

Train VAE, train Variational Auto Encoder from the image dataset

  python train_encoder.py --env=DOMAIN

Train MPC, train LS3 safe model predictive controller on specific domain

  python train_mpc.py --env=DOMAIN

Further details found here


Unsupervised Agents

The following unsupervised reinforcement learning agents are available, replace UNSUPERVISED_AGENT with Command. For example to use DIAYN, set UNSUPERVISED_AGENT = diayn.

Agent Command Type Implementation Author(s) Paper Intrinsic Reward
ICM icm Knowledge Denis paper $| | g(\mathbf{z}{t+1} | \mathbf{z}{t}, \mathbf{a}{t}) - \mathbf{z}{t+1} | | ^{2}$
Disagreement disagreement Knowledge Catherine paper $Var{ g{i} (\mathbf{z}{t+1} | \mathbf{z}{t}, \mathbf{a}{t}) }$
RND rnd Knowledge Kevin paper $| | g(\mathbf{z}{t}, \mathbf{a}{t}) - \tilde{g}(\mathbf{z}{t}, \mathbf{a}{t}) | | ^{2}_{2}$
APT(ICM) icm_apt Data Hao, Kimin paper $\sum{j \in random} \log | | \mathbf{z}{t} - \mathbf{z}_{j} | |$
APT(Ind) ind_apt Data Hao, Kimin paper $\sum{j \in random} \log | | \mathbf{z}{t} - \mathbf{z}_{j} | |$
ProtoRL proto Data Denis paper $\sum{j \in random} \log | | \mathbf{z}{t} - \mathbf{z}_{j} | |$
DIAYN diayn Competence Misha paper $\log q(\mathbf{w}|\mathbf{z}) + const$
APS aps Competence Hao, Kimin paper $r_{t}^{APT}(\mathbf{z}) + \log q(\mathbf{z} | \mathbf{w})$
SMM smm Competence Albert paper $\log p^{*}(\mathbf{z}) - \log q_{\mathbf{w}}(\mathbf{z}) - \log p(\mathbf{w}) + \log d(\mathbf{w} | \mathbf{z})$

Offline Agents

The following 5 RL procedures are available to learn a policy offline from unsupervised data. Replace OFFLINE_AGENT with Command, for example to use behavioral cloning, set OFFLINE_AGENT = bc.

Offline RL Procedure Command Paper
Behavior Cloning bc paper
CQL cql paper
CRR crr paper
TD3+BC td3_bc paper
TD3 td3 paper

Environments

The following environments with specific domains and tasks are provided. We also provide a wrapper to convert Gym environments to DMC extended time-step types based on DeepMind's acme wrapper.

Environment Type Domain Task
Deep Mind Control walker stand, walk, run, flip
Deep Mind Control quadruped walk, run, stand, jump
Deep Mind Control jaco reach_top_left, reach_top_right, reach_bottom_left, reach_bottom_right
Deep Mind Control cheetah run run_backward
Gym Box2D BipedalWalker-v3 walk
Gym Box2D CarRacing-v1 race
Gym Classic Control MountainCarContinuous-v0 goal
Safe Control SimplePointBot goal

License

The majority of URLS including the ExORL & URLB based code is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.