NVlabs / A-ViT

Official PyTorch implementation of A-ViT: Adaptive Tokens for Efficient Vision Transformer (CVPR 2022)
Apache License 2.0
138 stars 12 forks source link

Training accuracy #3

Open Mandy-77 opened 1 year ago

Mandy-77 commented 1 year ago

Thanks for your interesting and excellent work. I rerun the training codes using avit-tiny but only get 68.26% top-1 accuracy on ImageNet, would different training processes cause that much difference? Additionally, how to actually 'remove' those stopped tokens in the inference stage to reduce inference time when batchsize>1?

dk-liang commented 1 year ago

"how to actually 'remove' those stopped tokens in the inference stage to reduce inference time when batchsize>1?"

I have the same question.

Could the authors give some explanation?

hongxuyin commented 1 year ago

Hi Mandy, thanks for letting us know. This set yields the accuracy of the provide checkpoint. Can you share your training environment and the exact code you run? Also hi dk, we will update repository with more snippets in coming versions. Stay tuned.