This repo contains code for our paper Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future
The code base contains multiple branches.
Based on code base for the BabyAI project at Mila. https://github.com/mila-iqia/babyai
Follow similar installations as in https://github.com/mila-iqia/babyai.
Requirements:
Requirements:
Start by manually installing PyTorch. See the PyTorch website for installation instructions specific to your platform.
Then, clone this repository and install the other dependencies with pip3
:
git clone https://github.com/facebookresearch/modeling_long_term_future.git
cd modeling_long_term_future
pip3 install --editable .
Create a new conda env using env.yml in the repo
We use the BabyAI Pickup-Unlock game.
First train the teacher (for imitation learning) using PPO with curriculum learning. Start with a room size of 6 and then work our way up to room size of 15.
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 6
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 8 --model MODEL_ROOM6_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 10 --model MODEL_ROOM8_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 12 --model MODEL_ROOM10_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 15 --model MODEL_ROOM12_PRETRAINED
Generate expert trajectories from the experts trained using curriculum learning
mnkdir data
python3 -m scripts.gen_samples --episodes 10000 --env BabyAI-UnlockPickup-v0 --model pretrained_model_room_10 --room 10
To run our model
python3 -m scripts.zforcing_main_state_dec --env BabyAI-UnlockPickup-v0 --datafile EXPERT_DATA_TO_LOAD --model MODEL_NAME --eval-episodes 100 --eval-interval 200 --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.0001 --bwd-l2-weight 1. --kld-weight-start 0.2 --aux-weight-end 0.0001 --room 10
To run the baseline
python3 -m scripts.zforcing_main_state_dec --datafile EXPERT_DATA_TO_LOAD --env BabyAI-UnlockPickup-v0 --model MODEL_NAME --eval-episodes 100 --eval-interval 200 --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.000 --aux-weight-end 0.0 --room 10
Find license in LICENSE file.