~/.mujoco/mjpro150
and copy your license key to ~/.mujoco/mjkey.txt
pip install -r requirements.txt
All the data that RL2S needed is saved in ./l2s_dataset
.
./l2s_dataset/hopper/train_data/dataset_policy_{index}.pkl
. ./l2s_dataset/hopper
. You can specify the data used to train and evaluate the simuator through train_data_index
and test_data_index
in l2s_demos_listing.yaml
.python run_online_sac.py -e exp_specs/online_hopper.yaml --nosrun -c 0
.policy_{index}.pkl
.python3 utils_script.py -d hopper -t 2 -g 0
to sample data for each policy. python3 utils_script.py -d hopper -t 3 -g 0
to compute the mean and std for its observation which will be used in exp_specs/l2s_hopper.yaml
../l2s_dataset/leaned_dynamic/hopper
./l2s_dataset/end_data/hopper.pkl
Before running experiments, you should check the index in l2s_demo_listings.yaml
corresponds to the index of the policies in l2s_dataset
To run RL2S, please use a command like this, and the use_robust
in l2s_hopper.yaml
should be set to true. During training, the AVD, MVD will be logged in ./l2s_logs/RL2S/.../progress.csv
python3 run_l2s.py -e exp_specs/l2s_hopper.yaml --nosrun -c 0
For GAIL, just set the use_robust
to false.
Please use a command like this to get the performance of the policy in the learned simulator.
python3 utils_script.py -d hopper -t 0 -g 0
Please use a command like this to compute the kendall rank correlation coefficient and nDCG.
python3 utils_script.py -d hopper -t 1
For policy improvement, run the command below.
python3 run_l2s_downstream.py -e exp_specs/l2s_downstream_hopper.yaml --nosrun -c 2