Source code to accompany Off Policy Adversarial Inverse Reinforcement Learning.
If you use this code for your research, please consider citing the paper:
@article{arnob2020off,
title={Off-Policy Adversarial Inverse Reinforcement Learning},
author={Arnob, Samin Yeasar},
journal={arXiv preprint arXiv:2005.01138},
year={2020}
}
*`inverse_rl (from :https://github.com/justinjfu/inverse_rl)
* rllab
* sandbox`
* rllab (https://github.com/openai/rllab)
* PyTorch
* Python 2
* mjpro131
* pip install mujoco-py==0.5.7
* PyTorch
* Python 3
* mujoco-py==1.50.1.68
python Train.py --seed 0 \
--env_name "HalfCheetah-v2" \
--learn_temperature \
--policy_name "SAC"
Description of different arguments are following:
HalfCheetah-v2, Ant-v2, Hopper-v2, Walker2d-v2
CustomAnt-v0, PointMazeLeft-v0
SAC
, SAC_MCP
(k=8 premitive policies), SAC_MCP2
(k=4 premitive policies)
python ReTrain.py --seed 0
--env_name "DisabledAnt-v0" \
--learn_temperature \
--policy_name "SAC" \
--initial_state "random" \
--initial_runs "policy_sample"\
--load_gating_func\
--learn_actor
Description of different arguments are following:
Enviroment options: `
DisabledAnt-v0, PointMazeRight-v0
learn_temperature:
Policy options SAC
, SAC_MCP
(k=8 premitive policies), SAC_MCP2
(k=4 premitive policies)
initial_state
zero
environment starts from same staterandom
environment starts from random states
--initial_runs "policy_sample"\
load_gating_func
SAC_MCP
and SAC_MCP2
gating function
from imitation traininggating function
learn_actor
SAC_MCP
and SAC_MCP2
policy
and gating function
gating function