SHI-Labs / Compact-Transformers

Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)
https://arxiv.org/abs/2104.05704
Apache License 2.0
495 stars 77 forks source link

The question about Vit-lite model #57

Closed TIEHua closed 2 years ago

TIEHua commented 2 years ago

Hi, sorry to bother you. I have a question about Vit-lite model. In Vit-lite, the convolution layer in Tokenizer should be 0. But I found the conv_layers is defined as 1 in Vit-lite. So when I use Vit-lite, is the Command like this? python3 main.py --dataset cifar10 --epochs 300 --lr 0.001 --model vit_lite_7 --patch-size 4 --conv-size 3 --conv-layers 0 --warmup 10 --batch-size 128 ./cifar10

TIEHua commented 2 years ago

I found conv_layers is defined as 1. When kernel_size and stride are set to be the same as patch_size, it can be seen as dividing the image into N patches and performing linear_project. I don't know if I understand correctly.

alihassanijr commented 2 years ago

Hello, Thank you for your interest. A single PxP conv with a PxP stride is equivalent to PxP patching and embedding. That's why it should be 1.