ViT-B Training for DeiT

ziqipang commented 10 months ago

Thank you for your excellent work and for sharing the code! I learned a lot from what you have described.

Recently, I have been trying to use DeiT to train a plan ViT-Base model. I could follow the documentation to reproduce the ViT-Tiny and ViT-Small performance, but the same training procedure on ViT-Base has the accuracy of 78.9% on ImageNet1K, which is even worse than ViT-Small.

Therefore, I am wondering what could be the hidden tricks for training a good ViT-Base. Could you please share some hints? Thank you so much for the help!

Alihjt commented 4 months ago

Hey. Did you find any configs?

ziqipang commented 4 months ago

@Alihjt No luck. One thing I found was that per-gpu batch size seemed to influence the numerical stability (acc improved with a smaller per-gpu batch size). Although I didn't had the chance to verify or explain the reason, using 16GPUs x 64 images per GPU would give a better performance than my previous run (8 GPUs x 128 images per GPU).

facebookresearch / deit

ViT-B Training for DeiT #233