Calculation of No. of trainable parameters in ViT

SHI-Labs / Compact-Transformers

Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)

Apache License 2.0

495 stars 77 forks source link

Hello, Thank you for your interest. Embedding dim per head is 64, therefore the smallest model with 2 heads has a total embedding dim of 128. The largest with 4 heads is 256-dimensional. I believe these were noted in the paper, and just by loading one of the models, or looking at the source for one of them, it will be clear that none of our models have an embedding dim of 768:

https://github.com/SHI-Labs/Compact-Transformers/blob/e6869b4326e252bf9d6c1e5439ab824f2f4d7f2a/src/vit.py#L83-L100

I'll close this issue for now, but feel free to open it back up in case you have any other questions.

SHI-Labs / Compact-Transformers

Calculation of No. of trainable parameters in ViT #43