allenai / advisor

Apache License 2.0
8 stars 0 forks source link

Bridging the Imitation Gap by Adaptive Insubordination

This repository includes the code that can be used to reproduce the results of our paper Bridging the Imitation Gap by Adaptive Insubordination.

Our experiments were primarily run using the AllenAct learning framework, see AllenAct.org for details and tutorials.

šŸ”Ž Table of contents

šŸ’» Installation

Simply create a new python virtual environment and, within this virtual environment, run

pip install -r requirements.txt

šŸ PoisonedDoors and MiniGrid Experiments

šŸ“ˆ Generating Hyperparameter Plots from TSVs

We have included 9 tsv files in the hp_runs directory corresponding to the PoisonedDoors task and the 8 MiniGrid (described in Appendix A.5 of the paper). Each TSV file contains the results from training 50 hyperparameterized models for each of the 15 baselines included in this work (see Table A.1 in our paper's appendix).

Using the following command the user can generate the plots for the task WallCrossingCorruptExpertS25N10 i.e. WC Corrupt (S25, N10) of the submission (runtime: < 1min).

python minigrid_and_pd_scripts/summarize_random_hp_search.py --env_name AskForHelpLavaCrossingOnce

Running the above command will print something similar to:

random_hp_search_runs_AskForHelpLavaCrossingOnce.tsv
{'ppo': 50, 'bc': 50, 'dagger': 50, 'bc_teacher_forcing': 50, 'bc_with_ppo': 50, 'bc_then_ppo': 50, 'dagger_then_ppo': 50, 'bc_teacher_forcing_then_ppo': 50, 'pure_offpolicy': 50, 'ppo_with_offpolicy': 50, 'gail': 50, 'advisor': 50, 'bc_teacher_forcing_then_advisor': 50, 'dagger_then_advisor': 50, 'ppo_with_offpolicy_advisor': 50}
750
Starting expected max reward computations
Computing expected max reward for PPO
Computing expected max reward for BC
Computing expected max reward for DAgger $(\dagger)$
Computing expected max reward for BC$^{\text{tf}=1}$
Computing expected max reward for BC$+$PPO (static)
Computing expected max reward for BC$ \to$ PPO
Computing expected max reward for $\dagger \to$ PPO
Computing expected max reward for BC$^{\text{tf}=1} \to$ PPO
Computing expected max reward for BC$^{\text{demo}}$
Computing expected max reward for BC$^{\text{demo}} +$ PPO
Computing expected max reward for GAIL
Computing expected max reward for ADV
Computing expected max reward for BC$^{\text{tf}=1} \to$ ADV
Computing expected max reward for $\dagger \to$ ADV
Computing expected max reward for ADV$^{\text{demo}} +$ PPO
findfont: Font family ['serif'] not found. Falling back to DejaVu Sans.
Figure saved to hp_runs/neurips21_plots/AskForHelpLavaCrossingOnce/random_hp_search_runs_AskForHelpLavaCrossingOnce__train_steps_AskForHelpLavaCrossingOnce_reward.pdf
Figure saved to hp_runs/neurips21_plots/AskForHelpLavaCrossingOnce/random_hp_search_runs_AskForHelpLavaCrossingOnce__hpruns_AskForHelpLavaCrossingOnce_reward.pdf

indicating each of the 15 baselines and the number of hyperparameters searched per model. The last two line of this output shows the location where the resulting plots were saved.

By simply omitting the --env_name flag i.e. using the following command, all the tasks would be processed (runtime: < 5min):

python minigrid_and_pd_scripts/summarize_random_hp_search.py

We have already included these pdfs, and the above commands would regenerate and overwrite them

šŸ§® Generating TSVs

We have included the scripts for generating the tsv files described above. These scripts take 18-24 hours on a 48 CPU, 4 NVIDIA T4 GPU machine (g4dn.12xlarge AWS instance). The minigrid_and_pd_scripts/random_hp_search.py script achieves this. For WallCrossingS25N10 task, this can be done with the following command (runtime: < 24 hours on g4dn.12xlarge):

python projects/advisor/minigrid_and_pd_scripts/random_hp_search.py \
    RUN \
    -m 1 \
    -b projects/advisor/minigrid_and_pd_experiments \
    --output_dir hp_runs \
    --log_level error \
    --env_name CrossingS25N10

As mentioned before, the above command runs 50 models for all baselines including those based solely on expert demonstrations. For these to function, we need to save demonstrations for each of the tasks. The minigrid_and_pd_scripts/save_expert_demos.py script handles this. For eg., to save demonstrations for the CrossingS25N10 task, use the following (runtime: < 15 mins on g4dn.12xlarge):

python minigrid_and_pd_scripts/save_expert_demos.py bc \
    -b minigrid_and_pd_experiments/ \
    -o minigrid_data/minigrid_demos \
    --env_name CrossingS25N10

šŸ’”šŸ  2D-Lighthouse Experiments

For 2D-Lighthouse, we have included all the code under the lighthouse_experiments and lighthouse_scripts directories. To reproduce Figures 6 and A.3 from our paper see the lighthouse_scripts/save_pairwise_imitation_data.py and lighthouse_scripts/summarize_pairwise_imitation_data.py scripts. The TSV files produced by the save_pairwise_imitation_data.py can be found in the ./hp_runs/lighthouse/pairwise_imitation_results directory. The figures corresponding to these TSV files are generated by the summarize_pairwise_imitation_data.py; we have generated these files for you, they can be found in the ./hp_runs/lighthouse/metric_comparisons directory.

šŸ… RoboTHOR ObjectNav Experiments

Our RoboTHOR ObjectNav experiment configuration files can be found in the objectnav_experiments/robothor directory. Note that these experiments only include the ADVISOR and BC+PPO baselines, our other two baselines (BC and PPO) are taken from the ObjectNav baselines in AllenAct.

šŸ¤ OpenAI Multi Agent Particle Environment Experiments

Due to the nature of the MPE experiments, we have conducted them in a separate codebase. You can find the code for MPE CoopNav here.

šŸš§ Habitat PointNav Experiments

Due to the nature of the Habitat, we have conducted them in a separate codebase and is under work.

šŸ“œ Citation

If you use this work, please cite:

@inproceedings{advisor,
  title={Bridging the Imitation Gap by Adaptive Insubordination},
  author={Weihs, Luca and Jain, Unnat and Liu, Iou-Jen and Salvador, Jordi and Lazebnik, Svetlana and Kembhavi, Aniruddha and Schwing, Alexander},
  booktitle={NeurIPS},
  year={2021},
  note = {the first two authors contributed equally},
}