This repository is an implementation of Optimizing Millions of Hyperparameters by Implicit Differentiation.
Create a Python 3.7 environment and install required packages:
conda create -n ift-env python=3.7
source activate ift-env
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
pip install -r requirements.txt
Install Jupyter lab:
conda install -c conda-forge jupyterlab
Consider the following tests to verify the environment is correctly setup:
python mnist_test.py
--datasize <train set size>
--valsize <validation set size>
--lrh <hyperparameter lr need to be negative>
--epochs <min epochs for training model>
--hepochs <# of iterations for hyperparameter update>
--l2 <initial log weight decay>
--restart <reinitialize model weight after each hyperparameter update or not>
--model <cnn for lenet like model, mlp for logistic regession and mlp>
--dataset <CIFAR10 or MNIST>
--num_layers <# of hidden layer for mlp>
--hessian<KFAC: KFAC estiamte; direct:true hessian and inverse>
--jacobian<direct: true jacobian; product: use d_L/d_theta * d_L/d_lambda>
Trained models after each hyperparameter update will be stored in folder defined in line 627 in mnist_test.py
.
To use CG to compute inverse of hessian, change line 660's hyperparameter updator.
python mnist_test.py --datasize 40000 --valsize 10000 --lrh 0.01 --epochs=100 --hepochs=10 --l2=1e-5 --restart=10 --model=mlp --dataset=MNIST --num_layers=1 --hessian=KFAC --jacobian=direct
First, make sure you are on the master node:
ssh <USERNAME>@q.vectorinstitute.ai
Submit a job to the Slurm scheduler:
srun --partion=gpu --gres=gpu:1 --mem=4GB python mnist_test.py
Or, submit a batch of jobs defined by srun_script.sh
:
sbatch --array=0-2 srun_script.sh
View queued jobs for a user:
squeue -u $USERNAME
Cancel jobs for a user:
scancel -u $USERNAME
Cancel a specific job:
scancel $JOBID
Here, we should place commands for deploying experiments with and without Slurm
To deploy all of the experiments data generation:
sbatch run_all.sh
Data Augmentation Network
python train_augment_net2.py --use_augment_net
Loss Reweighting Network
python train_augment_net2.py --use_reweighting_net --loss_weight_type=softmax
The LSTM code in this repository is built on the AWD-LSTM codebase.
These commands should be run from inside the rnn
folder.
First, download the PTB dataset by running:
./getdata.sh
Tune LSTM hyperparameters with 1-step unrolling
python train.py
To train an STN, run the following command from inside the stn
folder:
python hypertrain.py --tune_all --save
python train_checkpoint.py --dataset cifar10 --model resnet18 --data_augmentation
python finetune_checkpoint.py --load_checkpoint=baseline_checkpoints/cifar10_resnet18_sgdm_lr0.1_wd0.0005_aug1.pt --num_finetune_epochs=10 --wdecay=1e-4
Explain what experiment does, and what figure it is in the paper.
To run python script:
python script.py
To deploy with Slurm:
srun ...
.
├── HAM_dataset.py
├── README.md
├── cutout.py
├── data_loaders.py
├── finetune_checkpoint.py
├── finetune_ift_checkpoint.py
├── grid_search.py
├── images
├── inverse_comparison.py
├── isic_config.py
├── isic_loader.py
├── kfac.py
├── kfac_utils.py
├── minst_ref.py
├── mnist_test.py
├── models
│ ├── __init__.py
│ ├── resnet.py
│ ├── resnet_cifar.py
│ ├── simple_models.py
│ ├── unet.py
│ └── wide_resnet.py
├── papers
│ ├── haoping_project
│ │ ├── main.tex
│ │ ├── neurips2019.tex
│ │ ├── neurips_2019.sty
│ │ └── references.bib
│ └── nips
│ ├── main.tex
│ ├── neurips_2019.sty
│ └── references.bib
├── random_search.py
├── requirements.txt
├── rnn
│ ├── config_scripts
│ │ ├── dropoute_ift_no_lrdecay.yaml
│ │ ├── dropouto
│ │ │ ├── dropouto_2layer_lrdecay.yaml
│ │ │ ├── dropouto_2layer_no_lrdecay.yaml
│ │ │ ├── dropouto_ift_lrdecay.yaml
│ │ │ ├── dropouto_ift_neumann_1_lrdecay.yaml
│ │ │ ├── dropouto_ift_neumann_1_no_lrdecay.yaml
│ │ │ ├── dropouto_ift_no_lrdecay.yaml
│ │ │ ├── dropouto_lrdecay.yaml
│ │ │ ├── dropouto_no_lrdecay.yaml
│ │ │ └── dropouto_perparam_ift_no_lrdecay.yaml
│ │ └── wdecay
│ │ ├── ift_wdecay_per_param_no_lrdecay.yaml
│ │ ├── wdecay_ift_lrdecay.yaml
│ │ └── wdecay_ift_neumann_1_lrdecay.yaml
│ ├── create_command_script.py
│ ├── data.py
│ ├── embed_regularize.py
│ ├── getdata.sh
│ ├── locked_dropout.py
│ ├── logger.py
│ ├── model_basic.py
│ ├── plot_utils.py
│ ├── rnn_utils.py
│ ├── run_grid_search.py
│ ├── train.py
│ ├── train2.py
│ └── weight_drop.py
├── search_configs
│ ├── cifar100_wideresnet_bern_dropout_sep.yaml
│ ├── cifar100_wideresnet_gauss_dropout_sep.yaml
│ ├── cifar10_resnet32_data_aug.yaml
│ ├── cifar10_resnet32_grid.yaml
│ ├── cifar10_resnet32_random.yaml
│ ├── cifar10_resnet32_wdecay_per_layer.yaml
│ ├── cifar10_wideresnet_bern_dropout.yaml
│ ├── cifar10_wideresnet_bern_dropout_sep.yaml
│ ├── cifar10_wideresnet_gauss_dropout.yaml
│ ├── cifar10_wideresnet_gauss_dropout_sep.yaml
│ ├── isic_grid.yaml
│ └── isic_random.yaml
├── search_scripts
│ ├── cifar100_wideresnet_bern_dropout_sep
│ ├── cifar100_wideresnet_gauss_dropout_sep
│ ├── cifar100_wideresnet_random
│ ├── cifar10_wideresnet_bern_dropout
│ ├── cifar10_wideresnet_bern_dropout_sep
│ ├── cifar10_wideresnet_gauss_dropout
│ └── cifar10_wideresnet_gauss_dropout_sep
├── srun_script.sh
├── stn
│ ├── datasets
│ │ ├── __init__.py
│ │ ├── cifar.py
│ │ └── loaders.py
│ ├── hypermodels
│ │ ├── __init__.py
│ │ ├── alexnet.py
│ │ ├── hyperconv2d.py
│ │ ├── hyperlinear.py
│ │ └── small.py
│ ├── hypertrain.py
│ ├── models
│ │ ├── __init__.py
│ │ ├── alexnet.py
│ │ └── small.py
│ └── util
│ ├── __init__.py
│ ├── cutout.py
│ ├── dropout.py
│ └── hyperparameter.py
├── train.py
├── train_augment_net2.py
├── train_augment_net_graph.py
├── train_augment_net_multiple.py
├── train_augment_net_slurm.py
├── train_baseline.py
├── train_checkpoint.py
└── utils
├── csv_logger.py
├── discrete_utils.py
├── logger.py
├── plot_utils.py
└── util.py
17 directories, 103 files