alan-turing-institute / ARC-LoCoMoSeT

Low-Cost Model Selection for Transformers
MIT License
1 stars 0 forks source link

LoCoMoSeT: Low-Cost Model Selection for Transformers

This project compared several metrics for predicting the best pre-trained models for fine-tuning on a new downstream task, without having to fine-tune each model. In particular, it explored the case of transferring ImageNet pre-trained vision transformers (ViTs) to several new image classification datasets. The metrics can be classified as zero-cost, such as number of model parameters or claimed ImageNet validation accuracy, or low-cost (much lower than fine-tuning), such as those proposed by Renggli et al. and Kaichao You et al..

This repository contains the code used to run experiments fine-tuning a pool of ViTs on several new datasets, computing the metrics on the same models/datasets, and comparing the rankings of the models by the metrics to the actual achieve fine-tuned model accuracies.

Installation

  1. Clone this repository

  2. Install with pip (or see the developer setup below for using a poetry environment instead):

    pip install .
  3. If using LogME, NCE or LEEP clone the following repository https://github.com/thuml/LogME into src:

    git clone https://github.com/thuml/LogME.git src/locomoset/LogME

Usage

Download ImageNet

ImageNet-1k is gated so you need to login with a HuggingFace token to download it (they're under https://huggingface.co/settings/tokens in your account settings). Log in to the HuggingFace CLI:

huggingface-cli login

Once you've done this, head on over to https://huggingface.co/datasets/imagenet-1k, read the terms and conditions and if happy to proceed agree to them. Then run:

python -c "import datasets; datasets.load_dataset('imagenet-1k')"

But note this will take a long time (hours).

Config Files

To run either metrics or training in LoCoMoSeT (see below), metrics and/or training config files are required. Examples are given in example_metrics_config.yaml and example_train_config.yaml for metrics and training configs respectively.

Both kinds of config should contain:

If use_wandb is true, then under wandb_args the following shoud additionally be specified:

Metrics configs should additionally contain:

Train configs should additionally contain the following nested under dataset_args:

Along with any further training_args, which are all directly passed to HuggingFace TrainingArguments, for example:

Since in practice you will likely wish to run many jobs together, LoCoMoSeT provides support for top-level configs from which you can generate many lower-level configs. Top-level configs can contain parameters for metrics scans, model training, or both. Broadly, this should contain the arguments laid out above, with some additional arguments and changes. An example is given in example_top_config.yaml

The additional arguments are:

If use_bask is True, then you should include the following additional arguments nested under bask. They should be further nested under train and/or metrics as required:

The changes are:

To generate configs from the top level config, run

locomoset_gen_configs <top_level_config_file_path>

This will generate training and/or metrics configs across all combinations of model, dataset, and random state. locomoset_gen_configs will automatically detect whether your top-level config contains training and/or metrics-related arguments and will generate both kinds of config accordingly.

Run a metric scan

With the environment activated (poetry shell):

locomoset_run_metrics <config_file_path>

For an example config file see configs/config_wp1.yaml.

This script will compute metrics scores for a given model, dataset, and random state.

Train a model

With the environment activated (poetry shell):

locomoset_run_train <config_file_path>

This script will train a model for a given model name, dataset, and random state.

Save plots

Metric Scores vs. No. of Images

This plot shows how the metric values (y-axis) change with the number of images (samples) used to compute them (x-axis). Ideally the metric should converge to some fixed value which does not change much after the number of images is increased. The number of images it takes to get a reliable performance prediction determines how long it takes to compute the metric, so metrics that converge after seeing fewer images are preferable.

To make a plot of metric scores vs. actual fine-tuned performance performance:

locomoset_plot_vs_samples <PATH_TO_RESULTS_DIR>

Where <PATH_TO_RESULTS_DIR> is the path to a directory containing JSON files produced by a metric scan (see above).

You can also run locomoset_plot_vs_samples --help to see the arguments.

Metric Scores vs. Fine-Tuned Performance

This plot shows the predicted performance score for each model from one of the low-cost metrics on the x-axis, and the actual fine-tuned performance of the models on that dataset on the y-axis. A high quality metric should have high correlation between its score (which is meant to reflect the transferability of the model to the new dataset) and the actual fine-tuned model performance.

To make this plot:

locomoset_plot_vs_actual <PATH_TO_RESULTS_DIR> --scores_file <path_to_scores_file> --n_samples <n_samples>

Where:

You can also run locomoset_plot_vs_actual --help to see the arguments.

Development

Developer Setup

  1. Install dependencies with Poetry

    poetry install
  2. If using LogME, NCE or LEEP clone the following repository https://github.com/thuml/LogME into src:

    git clone https://github.com/thuml/LogME.git src/locomoset/LogME
  3. Install pre-commit hooks:

    poetry run pre-commit install --install-hooks

Common Commands/Tasks