MLRichter / phd-lab

Experimentor based on delve-deep-dive
2 stars 3 forks source link

PHD-LAB Experimental Environment for Saturation based Experiments

Introduction

The phd lab repository contains routines for training of networks, extraction of latent representations, saturation computation and other experimental and probe training.

Installing phd-lab

Phd-lab is written in python. It uses several third-party moduls which have to be installed in order to run the experiments. The following two sections provide installation instructions.

Installation with pip

The file requirements.txt can be used to install all requirements using pip into new, virtual environment (called phd-lab-env):

python3 -m venv phd-lab-env
source phd-lab-env/bin/activate
pip3 install -r requirements.txt

Remarks:

Installation with conda

When using conda, you can use the file environment.yml to set up a new conda environment caled phd-lab, containing all required packages:

conda env create -f environment.yml
conda activate phd-lab

Remarks:

Configure your Experiments

Models are configures using json-Files. The json files are collected in the ./configs folder.

{
    "model": ["resnet18", "vgg13", "myNetwork"],
    "epoch": [30],
    "batch_size": [128],

    "dataset": ["Cifar10", "ImageNet"],
    "resolution": [32, 224],

    "optimizer": ["adam", "radam"],
    "metrics": ["Accuracy", "Top5Accuracy", "MCC"],

    "logs_dir": "./logs/",
    "device": "cuda:0",

    "conv_method": ["channelwise"],
    "delta": [0.99],
    "data_parallel": false,
    "downsampling": null
}

Note that some elements are written as lists and some are not. A config can desribe an arbitrary number of experiments, where the number of experiments is the number of possible value combinations. The only exception from this rule are the metrics, which are allways provided as a list and are used all during every experiment. In the above example, we train 3 models on 2 datasets using 2 optimizers. This result in 3x2x2=12 total experiments. It is not necessary to set all these parameters everytime. If a parameter is not set a default value will be injected. You can inspect the default value of all configuration keys in phd_lab.experiments.utils.config.DEFAULT_CONFIG.

Logging

Logging is done in a folder structure. The root folder of the logs is specified in logs_dir of the config file. The system has the follow save structure

+-- logs
|   +-- MyModel
|   |   +-- MyDataset1_64                                               //dataset name followed by input resolution
|   |   |   +-- MyRun                                                   //id of this specific run
|   |   |   |   +--  probe_performance.csv                              //if you compute probe performances this file is added containing accuracies per layer, you may add a prefix to this file
|   |   |   |   +--  projected_results.csv                              //if you projected the networks
|   |   |   |   +--  computational_info.json                            //train_model.py will compute some meta info on FLOPS per inference step and save it as json
|   |   |   |   +--  MyModel-MyDataset1-r64-bs128-e30_config.json       //lets repeat this specific run
|   |   |   |   +--  MyModel-MyDataset1-r64-bs128-e30.csv               //saturation and metrics
|   |   |   |   +--  MyModel-MyDataset1-r64-bs128-e30.pt                //model, lr-scheduler and optimizer states
|   |   |   |   +--  MyModel-MyDataset1-r64-bs128-e30lsat_epoch0.png    //plots of saturation and intrinsic dimensionality
|   |   |   |   +--  MyModel-MyDataset1-r64-bs128-e30lsat_epoch1.png   
|   |   |   |   +--  MyModel-MyDataset1-r64-bs128-e30lsat_epoch2.png    
|   |   |   |   +--  .                                              
|   |   |   |   +--  .                                             
|   |   |   |   +--  .                                              
|   +-- VGG16
|   |   +-- Cifar10_32
.   .   .   .   .     
.   .   .   .   .   
.   .   .   .   .

The only exception from this logging structure are the latent representation, which will be dumped in the folder latentent_datasets in the top level of this repository. The reason for this is the size of the latent representation on the hard drive. You likely want to keep your light-weight csv-results in the logs, but may want to remove extracted latent representations on a regular basis to free up space. (They can be reextracted from the saved model quite easily, so it's not even a time loss realy)

Running Experiments

Execution of experiments is fairly straight forward. You can easily write scripts if you want to deviate from the out-of-the-box configurations (more on that later). In the phd_lab folder you will find scripts handling different kinds of model training and analysis.

Training models

There are 4 overall scripts that will conduct a training if called. It is worth noting that each script is calling the same Main-Functionn object in just a slightly different configuration. They therefore share the same command line arguments and basic execution logic. The scripts are:

All of these scripts have the same arguments:

Additionally extract_latent_representations.py has an additional argument:

Checkpointing

All metrics and the model itself are checkpointed after each epoch and the previous weights are overwritten. The system will automatically resume training at the end of the last finished epoch. If one or more trainings were completed, these trainings are skipped. Please note that post-training actions like the extractions of latent representations will still be executed. Furthermore runs are identified by their run-id. Runs under different run-ids generally do not recognize each other, even if they are based on the same configuration.

Extracting latent representations

Latent representations for an experiment (a specific model and dataset) can be obtained by the script extract_latent_representations.py.

python extract_latent_representations.py --config ./configs/myconfig.json --device cuda:0 --run-id MyRun --downsample 4

The script expects the usual parameters --config, --device, and --run-id, and the following additional value:

This script will feed the full dataset through the model and store the observed activation patterns for each layer. The data are stored in the directory latent_datasets/[experiment]/ and the files are called [train|eval]-[layername].p

+-- latent_datasets/
|   +-- ResNet18_XXS_Cifar10_32/
|   |   +-- eval-layer1-0-conv1.p
|   |   +-- eval-layer1-0-conv2.p
|   |   +-- ...
|   |   +-- model_pointer.txt
|   |   +-- train-layer1-0-conv1.p
|   |   +-- train-layer1-0-conv2.p
|   |   +-- ...
.   .
.   .
.   .

The .p are pickle files containing numpy arrays with the latent representations. The file model_pointer.txt contains the path to the log files.

Probe Classifiers and Latent Representation Extraction

Another operation that is possible with this repository is training probe classifiers on receptive fields. Probe Classifiers are LogisticRegression models. They are trained on the output of a neural network layer using the original labels. The performance relative to the model performance yields an intermediate solution quality for the trained model. After extracting the latent representations you can train the probe classifiers on the latent representation by calling

python train_probes.py --config ./configs/myconfig.json --prefix "SomePrefix" -mp 4

The script train_probes.py can take the following arguments:

The performance of the probe classifiers in stored in the log directory under the name probe_performances.csv.

The system uses joblist chaching and will recognize whether a logistic regression has allready been fitted on a particular latent representation and skip training if it has, making crash recovery less painful.

Using consecutive script calls of scripts to split your workload

All experiments are strictly tied to the run-id and their configuration. This means that two trained models are considered equal if they are trained using the same configuration parameters and run-id, regardless of the called script. There for you could for instance run:

python train_model.py --config ./configs/myconfig.json --device cuda:0 --run-id MyRun

followed by

python compute_receptive_field.py --config ./configs/myconfig.json --device cuda:0 --run-id MyRun

the latter script call will recognize the previously trained models and just skip to computing the receptive field and add the additional results to the logs.

Performing multiple tasks using the meta execution script

You can combine the steps of train a model, extract pixelwise latent representation and train probes using the script probe_meta_execution_script.py:

python probe_meta_execution_script.py --config ../configs/your_config.json -mp ${NumberOfCoresYouWantToUse} -d pixelwise --device cuda:0 --run-id ${YourRunID}

Adding Models / Datasets / Optimizers

You may want to add optimizer, models and datasets to this experimental setup. Basically there is a package for each of these ingredientes:

You can add datasets, model, metrics and optimizers by importing the respective factories in the __init__ file of the respective packages. The interfaces for the respective factories are defines as protocols in phd_lab.experiments.domain or you can simply orient yourself on the existing once in the package. If you want to use entirely different registries for datasets, models, optimizers and metrics you can change registry by setting different values for:

These registries do not need to be Module or Package-Types, they merely need to have a __dict__ that maps string keys to the respective factories. The name in the config file must allways match a factory in order to be a valid configuration.