SHI-Labs / Compact-Transformers

Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)
https://arxiv.org/abs/2104.05704
Apache License 2.0
495 stars 77 forks source link

Calculation of No. of trainable parameters in ViT #43

Closed zhoutianyu16tue closed 2 years ago

zhoutianyu16tue commented 2 years ago

Hi,

I was reading the medium post, https://medium.com/pytorch/training-compact-transformers-from-scratch-in-30-minutes-with-pytorch-ff5c21668ed5.

Once screenshot shows that image Could you also share the value of embed_dim.

With default value of 768, it doesn't seem likely that 0.2M parameters are correct for the model in the first row.

Best, Tianyu.

alihassanijr commented 2 years ago

Hello, Thank you for your interest. Embedding dim per head is 64, therefore the smallest model with 2 heads has a total embedding dim of 128. The largest with 4 heads is 256-dimensional. I believe these were noted in the paper, and just by loading one of the models, or looking at the source for one of them, it will be clear that none of our models have an embedding dim of 768:

https://github.com/SHI-Labs/Compact-Transformers/blob/e6869b4326e252bf9d6c1e5439ab824f2f4d7f2a/src/vit.py#L83-L100

I'll close this issue for now, but feel free to open it back up in case you have any other questions.