google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.04k stars 140 forks source link

Question about ViT-augreg ("How to train?") fine-tuning transfer #60

Closed lucasb-eyer closed 8 months ago

lucasb-eyer commented 8 months ago

We got the following question by e-mail by @alexlioralexli but think it's of general interest:

  1. Details about fine-tuning process?
  2. Commands to reproduce pre-training and fine-tuning runs from the paper?
lucasb-eyer commented 8 months ago

First answer by @andsteing

  1. We used our default transfer config: big_vision/configs/transfer.py, which uses inception crop (config -> preprocessing) and random horizontal flip (config).
  2. As for pre-training, refer to these configs: big_vision/configs/vit_i21k.py and big_vision/configs/vit_i1k.py (see module pydoc for more information).
lucasb-eyer commented 8 months ago

And addition by me, checking the old training logs and providing free-form text summary:

We select these on minival (held out from train):

  1. We sweep lr in 0.03, 0.01, 0.003, 0.001
  2. We sweep fine-tuning steps in 500, 2500, 10k, 20k

Couple probably important and fixed (not swept) settings, should be visible in the config Andreas linked: