This codebase is designed for training large-scale vision models using Cloud TPU VMs or GPU machines. It is based on Jax/Flax libraries, and uses tf.data and TensorFlow Datasets for scalable and reproducible input pipelines.
The open-sourcing of this codebase has two main purposes:
big_vision
aims to support research projects at Google. We are unlikely to
work on feature requests or accept external contributions, unless they were
pre-approved (ask in an issue first). For a well-supported transfer-only
codebase, see also vision_transformer.
Note that big_vision
is quite dynamic codebase and, while we intend to keep
the core code fully-functional at all times, we can not guarantee timely updates
of the project-specific code that lives in the .../proj/...
subfolders.
However, we provide a table with last known
commits where specific projects were known to work.
The following research projects were originally conducted in the big_vision
codebase:
The main entry point is a trainer module, which typically does all the
boilerplate related to creating a model and an optimizer, loading the data,
checkpointing and training/evaluating the model inside a loop. We provide the
canonical trainer train.py
in the root folder. Normally, individual projects
within big_vision
fork and customize this trainer.
All models, evaluators and preprocessing operations live in the corresponding subdirectories and can often be reused between different projects. We encourage compatible APIs within these directories to facilitate reusability, but it is not strictly enforced, as individual projects may need to introduce their custom APIs.
We have a powerful configuration system, with the configs living in the
configs/
directory. Custom trainers and modules can directly extend/modify
the configuration options.
Project-specific code resides in the .../proj/...
namespace. It is not always
possible to keep project-specific in sync with the core big_vision
libraries,
Below we provide the last known commit
for each project where the project code is expected to work.
Training jobs are robust to interruptions and will resume seamlessly from the
last saved checkpoint (assuming a user provides the correct --workdir
path).
Each configuration file contains a comment at the top with a COMMAND
snippet
to run it, and some hint of expected runtime and results. See below for more
details, but generally speaking, running on a GPU machine involves calling
python -m COMMAND
while running on TPUs, including multi-host, involves
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all
--command "bash big_vision/run_tpu.sh COMMAND"
See instructions below for more details on how to run big_vision
code on a
GPU machine or Google Cloud TPU.
By default we write checkpoints and logfiles. The logfiles are a list of JSON objects, and we provide a short and straightforward example colab to read and display the logs and checkpoints.
The first release contains the core part of pre-training, transferring, and evaluating classification models at scale on Cloud TPU VMs.
We have since added the following key features and projects:
Features and projects we plan to release in the near future, in no particular order:
We will continue releasing code of our future publications developed within
big_vision
here.
The following exist in the internal variant of this codebase, and there is no plan for their release:
We first discuss how to setup and run big_vision
on a (local) GPU machine,
and then discuss the setup for Cloud TPUs. Note that data preparation step for
(local) GPU setup can be largely reused for the Cloud TPU setup. While the
instructions skip this for brevity, we highly recommend using a
virtual environment when
installing python dependencies.
The first step is to checkout big_vision
and install relevant python
dependencies:
git clone https://github.com/google-research/big_vision
cd big_vision/
pip3 install --upgrade pip
pip3 install -r big_vision/requirements.txt
The latest version of jax
library can be fetched as
pip3 install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
You may need a different jax
package, depending on CUDA and cuDNN libraries
installed on your machine. Please consult
official jax documentation
for more information.
For unified and reproducible access to standard datasets we opted to use the
tensorflow_datasets
(tfds
) library. It requires each dataset to be
downloaded, preprocessed and then to be stored on a hard drive (or, if you use
"Google Cloud", preferably stored in a "GCP bucket".).
Many datasets can be downloaded and preprocessed automatically when used
for the first time. Nevertheless, we intentionally disable this feature and
recommend doing dataset preparation step separately, ahead of the first run. It
will make debugging easier if problems arise and some datasets, like
imagenet2012
, require manually downloaded data.
Most of the datasets, e.g. cifar100
, oxford_iiit_pet
or imagenet_v2
can be fully automatically downloaded and prepared by running
cd big_vision/
python3 -m big_vision.tools.download_tfds_datasets cifar100 oxford_iiit_pet imagenet_v2
A full list of datasets is available at this link.
Some datasets, like imagenet2012
or imagenet2012_real
, require the data to
be downloaded manually and placed into $TFDS_DATA_DIR/downloads/manual/
,
which defaults to ~/tensorflow_datasets/downloads/manual/
. For example, for
imagenet2012
and imagenet2012_real
one needs to place the official
ILSVRC2012_img_train.tar
and ILSVRC2012_img_val.tar
files in that directory
and then run
python3 -m big_vision.tools.download_tfds_datasets imagenet2012 imagenet2012_real
(which may take ~1 hour).
If you use Google Cloud
and, TPUs in particular, you can then upload
the preprocessed data (stored in $TFDS_DATA_DIR
) to
"Google Cloud Bucket" and use the bucket on any of your (TPU) virtual
machines to access the data.
Finally, after installing all python dependencies and preparing tfds
data,
the user can run the job using config of their choice, e.g. to train ViT-S/16
model on ImageNet data, one should run the following command:
python3 -m big_vision.train --config big_vision/configs/vit_s16_i1k.py --workdir workdirs/`date '+%m-%d_%H%M'`
or to train MLP-Mixer-B/16, run (note the gpu8
config param that reduces the default batch size and epoch count):
python3 -m big_vision.train --config big_vision/configs/mlp_mixer_i1k.py:gpu8 --workdir workdirs/`date '+%m-%d_%H%M'`
To create a single machine with 8 TPU cores, follow the following Cloud TPU JAX document: https://cloud.google.com/tpu/docs/run-calculation-jax
To support large-scale vision research, more cores with multiple hosts are recommended. Below we provide instructions on how to do it.
First, create some useful variables, which we be reused:
export NAME=<a name of the TPU deployment, e.g. my-tpu-machine>
export ZONE=<GCP geographical zone, e.g. europe-west4-a>
export GS_BUCKET_NAME=<Name of the storage bucket, e.g. my_bucket>
The following command line will create TPU VMs with 32 cores, 4 hosts.
gcloud compute tpus tpu-vm create $NAME --zone $ZONE --accelerator-type v3-32 --version tpu-ubuntu2204-base
big_vision
on TPU VMsFetch the big_vision
repository, copy it to all TPU VM hosts, and install
dependencies.
git clone https://github.com/google-research/big_vision
gcloud compute tpus tpu-vm scp --recurse big_vision/big_vision $NAME: --zone=$ZONE --worker=all
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "bash big_vision/run_tpu.sh"
We recommend preparing tfds
data locally as described above and then uploading
the data to Google Cloud
bucket. However, if you prefer, the datasets which
do not require manual downloads can be prepared automatically using a TPU
machine as described below. Note that TPU machines have only 100 GB of disk
space, and multihost TPU slices do not allow for external disks to be attached
in a write mode, so the instructions below may not work for preparing large
datasets. As yet another alternative, we provide instructions
on how to prepare tfds
data on CPU-only GCP machine.
Specifically, the seven TFDS datasets used during evaluations will be generated
under ~/tensorflow_datasets
on TPU machine with this command:
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "TFDS_DATA_DIR=~/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.tools.download_tfds_datasets cifar10 cifar100 oxford_iiit_pet oxford_flowers102 cars196 dtd uc_merced"
You can then copy the datasets to GS bucket, to make them accessible to all TPU workers.
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "rm -r ~/tensorflow_datasets/downloads && gsutil cp -r ~/tensorflow_datasets gs://$GS_BUCKET_NAME"
If you want to integrate other public or custom datasets, i.e. imagenet2012, please follow the official guideline.
For the full list of pre-trained models check out the load
function defined in
the same module as the model code. And for example config on how to use these
models, see configs/transfer.py
.
The following command line fine-tunes a pre-trained vit-i21k-augreg-b/32
model
on cifar10
dataset.
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.train --config big_vision/configs/transfer.py:model=vit-i21k-augreg-b/32,dataset=cifar10,crop=resmall_crop --workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'` --config.lr=0.03"
To train your own big_vision models on a large dataset,
e.g. imagenet2012
(prepare the TFDS dataset),
run the following command line.
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.train --config big_vision/configs/bit_i1k.py --workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'`"
big_vision
supports flexible parameter and model sharding strategies.
Currently, we support a popular FSDP sharding via a simple config change, see this config example.
For example, to run FSDP finetuning of a pretrained ViT-L model, run the following command (possible adjusting batch size depending on your hardware):
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.train --config big_vision/configs/transfer.py:model=vit-i21k-augreg-l/16,dataset=oxford_iiit_pet,crop=resmall_crop,fsdp=True,batch_size=256 --workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'` --config.lr=0.03"
A minimal example that uses public coco
captions data:
gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.trainers.proj.image_text.siglip --config big_vision/configs/proj/image_text/siglip_lit_coco.py --workdir gs://$GS_BUCKET_NAME/big_vision/`date '+%Y-%m-%d_%H%M'`"
gcloud compute tpus tpu-vm delete $NAME --zone $ZONE
gcloud compute tpus tpu-vm ssh $NAME --zone $ZONE --worker=all --command 'rm -rf ~/big_vision ~/bv_venv'
tfds
data on a standalone GCP CPU machine.First create a new machine and a disk (feel free to adjust exact machine type and disk settings/capacity):
export NAME_CPU_HOST=<A name of a CPU-only machine>
export NAME_DISK=<A name of a disk>
gcloud compute instances create $NAME_CPU_HOST --machine-type c3-standard-22 --zone $ZONE --image-family ubuntu-2204-lts --image-project ubuntu-os-cloud
gcloud compute disks create $NAME_DISK --size 1000GB --zone $ZONE --type pd-balanced
Now attach the disk to the newly create machine:
gcloud compute instances attach-disk $NAME_CPU_HOST --disk $NAME_DISK --zone $ZONE
Next, ssh
to the machine gcloud compute ssh $NAME_CPU_HOST --zone=$ZONE
and
follow instructions to format and mount the disk.
Let's assume it was mounted to /mnt/disks/tfds
.
Almost there, now clone and set up big_vision
:
gcloud compute ssh $NAME_CPU_HOST --zone=$ZONE --command "git clone https://github.com/google-research/big_vision.git && cd big_vision && sh big_vision/run_tpu.sh"
Finally, prepare the dataset (e.g. coco_captions
) using the utility script and
copy the result to you google cloud bucket:
gcloud compute ssh $NAME_CPU_HOST --zone=$ZONE --command "cd big_vision && TFDS_DATA_DIR=/mnt/disks/tfds/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.tools.download_tfds_datasets coco_captions"
gcloud compute ssh $NAME_CPU_HOST --zone=$ZONE --command "rm -rf /mnt/disks/tfds/tensorflow_datasets/downloads && gsutil cp -r /mnt/disks/tfds/tensorflow_datasets gs://$GS_BUCKET_NAME"
We provide a well-tuned ViT-S/16 baseline in the config file named
vit_s16_i1k.py
. It achieves 76.5% accuracy on ImageNet validation split in
90 epochs of training, being a strong and simple starting point for research
on the ViT models.
Please see our arXiv note for more details and if this baseline happens to by useful for your research, consider citing
@article{vit_baseline,
url = {https://arxiv.org/abs/2205.01580},
author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
title = {Better plain ViT baselines for ImageNet-1k},
journal={arXiv preprint arXiv:2205.01580},
year = {2022},
}
The last known commit where the specific project code is expected to work. The core code and configs are expected to work at head.
If you found this codebase useful for your research, please consider using the following BibTEX to cite it:
@misc{big_vision,
author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
title = {Big Vision},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/google-research/big_vision}}
}
This is not an official Google Product.
Unless explicitly noted otherwise, everything in the big_vision codebase (including models and colabs) is released under the Apache2 license. See the LICENSE file for the full license text.