LoCoMoSeT: Low-Cost Model Selection for Transformers

This project compared several metrics for predicting the best pre-trained models for fine-tuning on a new downstream task, without having to fine-tune each model. In particular, it explored the case of transferring ImageNet pre-trained vision transformers (ViTs) to several new image classification datasets. The metrics can be classified as zero-cost, such as number of model parameters or claimed ImageNet validation accuracy, or low-cost (much lower than fine-tuning), such as those proposed by Renggli et al. and Kaichao You et al..

This repository contains the code used to run experiments fine-tuning a pool of ViTs on several new datasets, computing the metrics on the same models/datasets, and comparing the rankings of the models by the metrics to the actual achieve fine-tuned model accuracies.

Installation

Clone this repository
Install with pip (or see the developer setup below for using a poetry environment instead):
```
pip install .
```
If using LogME, NCE or LEEP clone the following repository https://github.com/thuml/LogME into src:
```
git clone https://github.com/thuml/LogME.git src/locomoset/LogME
```

Usage

Download ImageNet

ImageNet-1k is gated so you need to login with a HuggingFace token to download it (they're under https://huggingface.co/settings/tokens in your account settings). Log in to the HuggingFace CLI:

huggingface-cli login

Once you've done this, head on over to https://huggingface.co/datasets/imagenet-1k, read the terms and conditions and if happy to proceed agree to them. Then run:

python -c "import datasets; datasets.load_dataset('imagenet-1k')"

But note this will take a long time (hours).

Config Files

To run either metrics or training in LoCoMoSeT (see below), metrics and/or training config files are required. Examples are given in example_metrics_config.yaml and example_train_config.yaml for metrics and training configs respectively.

Both kinds of config should contain:

caches: Contains cache locations and extra caching related arguments:
- models, datasets, wandb: Where to cache HuggingFace models & datasets and wandb runs & artifacts respectively.
- preprocess_cache: Set to disk, ram, or tmp to cache preprocessed data to disk (default), memory, or a temporary directory.
- tmp_dir: Overwrites the location of the temporary directory (usually only relevant if preprocess_cache is tmp, and generally should only be set if the default tmp dir for the OS is not large enough, e.g. on Baskerville this can be set to a path in /scratch-global if you need more disk quota than what's available in /tmp).
- writer_batch_size: How many images to cache in memory before writing to disk. during preprocessing (relevant if preprocess_cache is disk or tmp)
dataset_name: Name of the dataset on HuggingFace
dataset_args: Contains dataset split/column selection parameters:
- train_split, val_split, test_split: Training, validation, and test_split names. Should either already exist or they will be generated. Should NOT be the same
- val_size, test_size: Percentages (0.-1.) or integers denoting the size of the validation and test sets (if a percentage, as a percentage of the WHOLE dataset) to be generated. Can be null (None) if the corresponding split already exists
- image_field, label_field: Dataset column containing the image and class label
model_name: Name of the model to be used on HuggingFace
random_state: Seed for random number generation
run_name: Name for the wandb run
save_dir: Directory in which to save results
use_wandb: Set to true to log results to wandb
n_samples: The training dataset size, or null to use the whole train split.
keep_labels: A list of labels denoting which labels to keep - all images corresponding to other labels are dropped. Can be null to keep all

If use_wandb is true, then under wandb_args the following shoud additionally be specified:

entity: Wandb entity name
project: Wandb project name
job_type: Job type to group wandb runs with. Should be metrics or train
log_model: How to handle model logging in wandb

Metrics configs should additionally contain:

metrics_samples: How many samples to compute the metrics with. This will be a subset of the training dataset so should be less than or equal to the n_samples value. Or null to compute them with the whole train split (i.e. with n_samples images).
local_save: Set to true to locally save a copy of the results
metrics: A list of metrics implemented in src/locomost/metrics to be used
metric_kwargs: A list of the pattern metric_name: kwarg_1: value of kwargs to be passed to each metric if desired. Note that every metric used does not need to exist in this

Train configs should additionally contain the following nested under dataset_args:

train_split: Name of the data split to train on
val_split: Name of the data split to evaluate on. If the same as train_split, the train_split will itself be randomly split for training and evaluation

Along with any further training_args, which are all directly passed to HuggingFace TrainingArguments, for example:

eval_steps: Steps between each evaluation
evaluation_strategy: HuggingFace evaluation strategy
logging_strategy: HuggingFace logging strategy
num_train_epochs: Number of epochs to train model for
output_dir: Directory to store outputs in
overwrite_output_dir: Whether to overwrite the output directory
save_strategy: HuggingFace saving strategy
use_mps_device: Whether to use MPS

Since in practice you will likely wish to run many jobs together, LoCoMoSeT provides support for top-level configs from which you can generate many lower-level configs. Top-level configs can contain parameters for metrics scans, model training, or both. Broadly, this should contain the arguments laid out above, with some additional arguments and changes. An example is given in example_top_config.yaml

The additional arguments are:

config_dir: Location to store subconfigs
slurm_template_name: Name of the slurm template to be used. If set to null, it will be picked from src/locomoset/config
use_bask: Set to True if you wish to run the jobs on baskerville (HPC used in our research - for uses outside the Turing, this means a slurm script will be generated alongside the configs)

If use_bask is True, then you should include the following additional arguments nested under bask. They should be further nested under train and/or metrics as required:

job_name: Baskerville job name
walltime: Maximum runtime for the Baskerville job. Format is dd-hh:mm:ss
node_number: Number of nodes to use
gpu_number: Number of GPUs to use
cpu_per_gpu: Number of CPUs per GPU

The changes are:

models: Replaces model, contains a list of HuggingFace model names
random_states: Replaces random_state, contains a list of seeds to generate scripts over.
keep_labels: Now a list of lists to generate scripts over

To generate configs from the top level config, run

locomoset_gen_configs <top_level_config_file_path>

This will generate training and/or metrics configs across all combinations of model, dataset, and random state. locomoset_gen_configs will automatically detect whether your top-level config contains training and/or metrics-related arguments and will generate both kinds of config accordingly.

Run a metric scan

With the environment activated (poetry shell):

locomoset_run_metrics <config_file_path>

For an example config file see configs/config_wp1.yaml.

This script will compute metrics scores for a given model, dataset, and random state.

Train a model

With the environment activated (poetry shell):

locomoset_run_train <config_file_path>

This script will train a model for a given model name, dataset, and random state.

Save plots

Metric Scores vs. No. of Images

This plot shows how the metric values (y-axis) change with the number of images (samples) used to compute them (x-axis). Ideally the metric should converge to some fixed value which does not change much after the number of images is increased. The number of images it takes to get a reliable performance prediction determines how long it takes to compute the metric, so metrics that converge after seeing fewer images are preferable.

To make a plot of metric scores vs. actual fine-tuned performance performance:

locomoset_plot_vs_samples <PATH_TO_RESULTS_DIR>

Where <PATH_TO_RESULTS_DIR> is the path to a directory containing JSON files produced by a metric scan (see above).

You can also run locomoset_plot_vs_samples --help to see the arguments.

Metric Scores vs. Fine-Tuned Performance

This plot shows the predicted performance score for each model from one of the low-cost metrics on the x-axis, and the actual fine-tuned performance of the models on that dataset on the y-axis. A high quality metric should have high correlation between its score (which is meant to reflect the transferability of the model to the new dataset) and the actual fine-tuned model performance.

To make this plot:

locomoset_plot_vs_actual <PATH_TO_RESULTS_DIR> --scores_file <path_to_scores_file> --n_samples <n_samples>

Where:

<PATH_TO_RESULTS_DIR> is the path to a directory containing JSON files produced by a metric scan (see above).
<path_to_scores_file> is a mapping between model names and fine-tuned performance on ImageNet-1k, such as the file configs/scores_imagenet1k.yaml in this repo.
<n_samples> sets the no. of samples (images) the metric was computed with to plot. Usually a metrics scan includes results with different numbers of images, but for this plot the different metrics should be compared using a fixed no. of images only.

You can also run locomoset_plot_vs_actual --help to see the arguments.

Development

Developer Setup

Install dependencies with Poetry
```
poetry install
```
If using LogME, NCE or LEEP clone the following repository https://github.com/thuml/LogME into src:
```
git clone https://github.com/thuml/LogME.git src/locomoset/LogME
```

Install pre-commit hooks:

poetry run pre-commit install --install-hooks

Common Commands/Tasks

To run commands in the poetry virtual environment (in a terminal), either:
- Prefix the command you want to run with poetry run
- e.g. poetry run python myscript.py
- Enter the virtual environment with poetry shell and then run commands as normal
- then exit the virtual environment with exit
To run tests:
```
poetry run pytest
```
To run linters:
- If you have setup pre-commit flake8, black, and isort will run automatically before making commits
- Or you can run them manually:
```
poetry run black .
poetry run isort .
poetry run flake8
```

alan-turing-institute / ARC-LoCoMoSeT

readme