Closed TIEHua closed 2 years ago
I found conv_layers is defined as 1. When kernel_size and stride are set to be the same as patch_size, it can be seen as dividing the image into N patches and performing linear_project. I don't know if I understand correctly.
Hello, Thank you for your interest. A single PxP conv with a PxP stride is equivalent to PxP patching and embedding. That's why it should be 1.
Hi, sorry to bother you. I have a question about Vit-lite model. In Vit-lite, the convolution layer in Tokenizer should be 0. But I found the conv_layers is defined as 1 in Vit-lite. So when I use Vit-lite, is the Command like this? python3 main.py --dataset cifar10 --epochs 300 --lr 0.001 --model vit_lite_7 --patch-size 4 --conv-size 3 --conv-layers 0 --warmup 10 --batch-size 128 ./cifar10