facebookresearch / deit

Official DeiT repository
Apache License 2.0
3.94k stars 547 forks source link

What batch size number other than 1024 have been tried when training a DeiT model? #205

Open Phuoc-Hoan-Le opened 1 year ago

Phuoc-Hoan-Le commented 1 year ago

What batch size number other than batch size of 1024 have been tried when training a DeiT or ViT model? In the paper, DeiT (https://arxiv.org/abs/2012.12877), they used a batch size of 1024 and they mentioned that the learning rate should be scaled according to the batch size.

However, I was wondering if anyone have any experience or successfully train a DeiT model with a batch size that is even less than 512? If yes, what accuracy did you achieve?