Open jameslahm opened 1 year ago
Thank you for your comments. In our evaluation, we test the inference time of all MHSA and FFN modules in the model to estimate its throughputs. We have adopted your code and made comparison, in which we find that your results may be influenced by the token selection function which is not well optimized. Thank your for helping us to find this problem and we will try to optimize this function to make it faster.
Thank you for your great work! In Table 2 in the paper, I see that the pruned DeiT-Tiny can speed up the throughput from 2648.7 to 4496.2. But in my local test, I found that the pruned DeiT-Tiny's throughput (1819) is similar to the original DeiT-Tiny (1760). I use the provided compressed DeiT-Tiny model (Acc@1: 71.6, https://drive.google.com/file/d/1NSq3SRxnObfl6oaFE5gHtjnhzm0Lfc6S/view?usp=sharing). My environment is RTX 3090 and the throughput code is below:
I wonder if I did something wrong. Would you mind sharing your code for testing throughput? Thanks a lot.