This project compared several metrics for predicting the best pre-trained models for fine-tuning on a new downstream task, without having to fine-tune each model. In particular, it explored the case of transferring ImageNet pre-trained vision transformers (ViTs) to several new image classification datasets. The metrics can be classified as zero-cost, such as number of model parameters or claimed ImageNet validation accuracy, or low-cost (much lower than fine-tuning), such as those proposed by Renggli et al. and Kaichao You et al..
This repository contains the code used to run experiments fine-tuning a pool of ViTs on several new datasets, computing the metrics on the same models/datasets, and comparing the rankings of the models by the metrics to the actual achieve fine-tuned model accuracies.
Clone this repository
Install with pip
(or see the developer setup below for using a poetry environment instead):
pip install .
If using LogME, NCE or LEEP clone the following repository https://github.com/thuml/LogME into src
:
git clone https://github.com/thuml/LogME.git src/locomoset/LogME
ImageNet-1k is gated so you need to login with a HuggingFace token to download it (they're under https://huggingface.co/settings/tokens in your account settings). Log in to the HuggingFace CLI:
huggingface-cli login
Once you've done this, head on over to https://huggingface.co/datasets/imagenet-1k, read the terms and conditions and if happy to proceed agree to them. Then run:
python -c "import datasets; datasets.load_dataset('imagenet-1k')"
But note this will take a long time (hours).
To run either metrics or training in LoCoMoSeT (see below), metrics and/or training config files are required. Examples are given in example_metrics_config.yaml and example_train_config.yaml for metrics and training configs respectively.
Both kinds of config should contain:
caches
: Contains cache locations and extra caching related arguments:
models
, datasets
, wandb
: Where to cache HuggingFace models & datasets and wandb runs & artifacts respectively.preprocess_cache
: Set to disk
, ram
, or tmp
to cache preprocessed data to disk (default), memory, or a temporary directory.tmp_dir
: Overwrites the location of the temporary directory (usually only relevant if preprocess_cache
is tmp
, and generally should only be set if the default tmp dir for the OS is not large enough, e.g. on Baskerville this can be set to a path in /scratch-global
if you need more disk quota than what's available in /tmp
).writer_batch_size
: How many images to cache in memory before writing to disk. during preprocessing (relevant if preprocess_cache
is disk
or tmp
)dataset_name
: Name of the dataset on HuggingFacedataset_args
: Contains dataset split/column selection parameters:
train_split
, val_split
, test_split
: Training, validation, and test_split names. Should either already exist or they will be generated. Should NOT be the sameval_size
, test_size
: Percentages (0.-1.) or integers denoting the size of the validation and test sets (if a percentage, as a percentage of the WHOLE dataset) to be generated. Can be null
(None
) if the corresponding split already existsimage_field
, label_field
: Dataset column containing the image and class labelmodel_name
: Name of the model to be used on HuggingFacerandom_state
: Seed for random number generationrun_name
: Name for the wandb runsave_dir
: Directory in which to save resultsuse_wandb
: Set to true
to log results to wandbn_samples
: The training dataset size, or null
to use the whole train split.keep_labels
: A list of labels denoting which labels to keep - all images corresponding to other labels are dropped. Can be null
to keep allIf use_wandb
is true
, then under wandb_args
the following shoud additionally be specified:
entity
: Wandb entity nameproject
: Wandb project namejob_type
: Job type to group wandb runs with. Should be metrics
or train
log_model
: How to handle model logging in wandbMetrics configs should additionally contain:
metrics_samples
: How many samples to compute the metrics with. This will be a subset of the training dataset so should be less than or equal to the
n_samples
value. Or null
to compute them with the whole train split (i.e. with n_samples
images).local_save
: Set to true
to locally save a copy of the resultsmetrics
: A list of metrics implemented in src/locomost/metrics to be usedmetric_kwargs
: A list of the pattern metric_name: kwarg_1: value
of kwargs to be passed to each metric if desired. Note that every metric used does not need to exist in thisTrain configs should additionally contain the following nested under dataset_args
:
train_split
: Name of the data split to train onval_split
: Name of the data split to evaluate on. If the same as train_split
, the train_split
will itself be randomly split for training and evaluationAlong with any further training_args
, which are all directly passed to HuggingFace TrainingArguments
, for example:
eval_steps
: Steps between each evaluationevaluation_strategy
: HuggingFace evaluation strategylogging_strategy
: HuggingFace logging strategynum_train_epochs
: Number of epochs to train model foroutput_dir
: Directory to store outputs inoverwrite_output_dir
: Whether to overwrite the output directorysave_strategy
: HuggingFace saving strategyuse_mps_device
: Whether to use MPSSince in practice you will likely wish to run many jobs together, LoCoMoSeT provides support for top-level configs from which you can generate many lower-level configs. Top-level configs can contain parameters for metrics scans, model training, or both. Broadly, this should contain the arguments laid out above, with some additional arguments and changes. An example is given in example_top_config.yaml
The additional arguments are:
config_dir
: Location to store subconfigsslurm_template_name
: Name of the slurm template to be used. If set to null
, it will be picked from src/locomoset/configuse_bask
: Set to True
if you wish to run the jobs on baskerville (HPC used in our research - for uses outside the Turing, this means a slurm script will be generated alongside the configs)If use_bask
is True
, then you should include the following additional arguments nested under bask
. They should be further nested under train
and/or metrics
as required:
job_name
: Baskerville job namewalltime
: Maximum runtime for the Baskerville job. Format is dd-hh:mm:ssnode_number
: Number of nodes to usegpu_number
: Number of GPUs to usecpu_per_gpu
: Number of CPUs per GPUThe changes are:
models
: Replaces model
, contains a list of HuggingFace model namesrandom_states
: Replaces random_state
, contains a list of seeds to generate scripts over.keep_labels
: Now a list of lists to generate scripts overTo generate configs from the top level config, run
locomoset_gen_configs <top_level_config_file_path>
This will generate training and/or metrics configs across all combinations of model, dataset, and random state. locomoset_gen_configs
will automatically detect whether your top-level config contains training and/or metrics-related arguments and will generate both kinds of config accordingly.
With the environment activated (poetry shell
):
locomoset_run_metrics <config_file_path>
For an example config file see configs/config_wp1.yaml.
This script will compute metrics scores for a given model, dataset, and random state.
With the environment activated (poetry shell
):
locomoset_run_train <config_file_path>
This script will train a model for a given model name, dataset, and random state.
This plot shows how the metric values (y-axis) change with the number of images (samples) used to compute them (x-axis). Ideally the metric should converge to some fixed value which does not change much after the number of images is increased. The number of images it takes to get a reliable performance prediction determines how long it takes to compute the metric, so metrics that converge after seeing fewer images are preferable.
To make a plot of metric scores vs. actual fine-tuned performance performance:
locomoset_plot_vs_samples <PATH_TO_RESULTS_DIR>
Where <PATH_TO_RESULTS_DIR>
is the path to a directory containing JSON files produced by a metric scan (see above).
You can also run locomoset_plot_vs_samples --help
to see the arguments.
This plot shows the predicted performance score for each model from one of the low-cost metrics on the x-axis, and the actual fine-tuned performance of the models on that dataset on the y-axis. A high quality metric should have high correlation between its score (which is meant to reflect the transferability of the model to the new dataset) and the actual fine-tuned model performance.
To make this plot:
locomoset_plot_vs_actual <PATH_TO_RESULTS_DIR> --scores_file <path_to_scores_file> --n_samples <n_samples>
Where:
<PATH_TO_RESULTS_DIR>
is the path to a directory containing JSON files produced by a metric scan (see above).<path_to_scores_file>
is a mapping between model names and fine-tuned performance on ImageNet-1k, such as the file configs/scores_imagenet1k.yaml in this repo.<n_samples>
sets the no. of samples (images) the metric was computed with to plot. Usually a metrics scan includes results with different numbers of images, but for this plot the different metrics should be compared using a fixed no. of images only.You can also run locomoset_plot_vs_actual --help
to see the arguments.
Install dependencies with Poetry
poetry install
If using LogME, NCE or LEEP clone the following repository https://github.com/thuml/LogME into src
:
git clone https://github.com/thuml/LogME.git src/locomoset/LogME
Install pre-commit hooks:
poetry run pre-commit install --install-hooks
To run commands in the poetry virtual environment (in a terminal), either:
poetry run
poetry run python myscript.py
poetry shell
and then run commands as normalexit
To run tests:
poetry run pytest
To run linters:
flake8
, black
, and isort
will run automatically before making commitspoetry run black .
poetry run isort .
poetry run flake8