[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging techniques, while incorporating a differentiable compression rate.
87
stars
8
forks
source link
Inquiry on Fine-tuning Details for Table 2 in Your Repository #5
I would like to express my admiration for your work. It is truly straightforward and effective. However, I have encountered an issue while attempting to reproduce the "fine-tuning the model with searched compression rate for 30 epochs" as described in Table 2 of your documentation.
Specifically, after employing the EViT framework for fine-tuning, I noticed that the accuracy has unexpectedly decreased compared to the untrained model. I am reaching out to seek clarification on the training details you utilized during your experiments. Any insights or guidance you could provide would be greatly appreciated.
DropPath rate, set it as 0.1 for DeiT-S and DeiT-B. Additionally, larger model should have bigger DropParg rate, for example 0.2 for ViT-L (MAE). The setting of DropPath rate is always follow the pre-trained model, so you can find the detailed numbers in the MAE official repo.
Hello,
I hope this message finds you well.
I would like to express my admiration for your work. It is truly straightforward and effective. However, I have encountered an issue while attempting to reproduce the "fine-tuning the model with searched compression rate for 30 epochs" as described in Table 2 of your documentation.
Specifically, after employing the EViT framework for fine-tuning, I noticed that the accuracy has unexpectedly decreased compared to the untrained model. I am reaching out to seek clarification on the training details you utilized during your experiments. Any insights or guidance you could provide would be greatly appreciated.
Thank you very much for your time and assistance.