Closed HelloWXJ1024 closed 1 year ago
I also observe something similar. On applying this on finetuned ViT-L/16 MAE, I observe,
Acc@1 85.948 Acc@5 97.560 loss 0.646 (for r=0) (78 seconds runtime) Acc@1 81.984 Acc@5 96.350 loss 1.037 (for r=8) (57 seconds runtime) without prop attn. Speedup is 1.36x;
However, in the paper, under Table 10(a), the numbers reported are Acc@1 85.66 (for r=0) Acc@1 83.92 (for r=8) and 1.97x speedup
Please clarify
Hi, thanks for your interest!
But when I apply ToMe into DeiT-S without training, I found the Acc is 78.826%, which is lower than the 79.4% as Table 4 reported. Do you know the gap?
The DeiT number in that table is with training (as marked by the gray color). The off-the-shelf number below that (as marked by blue) is an AugReg model (i.e., timm). If you want to reproduce that number, you can use the off the shelf model for ViT-S in timm.
I also observe something similar. On applying this on finetuned ViT-L/16 MAE, I observe,
There is a possibility that the MAE implementation in the released implementation isn't correct. Thanks for testing this, I will look into this. The timm implementation is correct, however.
Speedup is 1.36x
As for timing, make sure you're only timing the model speed itself. If you're timing the entire dataset evaluation, then you're factoring in things like data loading and moving things from cpu to gpu which don't have anything to do with the model.
There was indeed a missing feature not implemented for the MAE code. I have created a separate patch for MAE to deal with this.
Now when running evaluation for an off-the-shelf MAE model I get
Acc@1 85.96 (for r=0)
Acc@1 84.22 (for r=8)
Which is as described in Table 1 of the paper (which shows 84.25).
For timing (using utils.benchmark
) I get
[r=0] Throughput: 242.96 im/s
[r=8] Throughput: 474.59 im/s
1.95x speed-up
Thank you for your excellent work!
But when I apply ToMe into DeiT-S without training, I found the Acc is 78.826%, which is lower than the 79.4% as Table 4 reported. Do you know the gap?