Is random resized crop used when pretraining vision transformers on imagenet21k?

google-research / vision_transformer

Apache License 2.0

10.45k stars 1.29k forks source link

Open Phuoc-Hoan-Le opened 2 years ago

Phuoc-Hoan-Le commented 2 years ago

Is random resized crop used when pretraining vision transformers on imagenet21k? Or is squish-resizing used?

andsteing commented 2 years ago

The images are pre-processed by Inception-style cropping – see section 3.4 of the how to train your vit paper.