NVlabs / A-ViT

Official PyTorch implementation of A-ViT: Adaptive Tokens for Efficient Vision Transformer (CVPR 2022)
Apache License 2.0
147 stars 13 forks source link

Unable to Reproduce top-1 Accuracy #5

Open mehtadushy opened 2 years ago

mehtadushy commented 2 years ago

Hi

I trained avit_tiny with the provided hyperparameters, and the validation accuracy on Imagenet is only 68.2% instead of the reported 71.4%.

Could you please let me know what hyperparameters to use to reproduce the results in the paper.

hongxuyin commented 2 years ago

Hi mehtadushy, thanks for letting us know. This set yields the accuracy of the provide checkpoint. Can you share your training environment and the exact code you run?

mehtadushy commented 2 years ago

I am using Pytorch 1.9. I can reproduce the reported numbers for avit_tiny if I use avit_small batch size hyperparameters, but the numbers are still different. I get 72.1 for avit_tiny, 79.4 for avit_small, and the deit baseline is 79.9, not 78.9 as reported in the paper. I do not know whether the higher avit_small/tiny performance in my retraining is due to a difference in flops or something else. Would you be able to share the code for flop calculation for this model?