[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging techniques, while incorporating a differentiable compression rate.
78
stars
7
forks
source link
The prune/merge token numbers for the last layer is out of range. #4
I tried to train for compression parameters on ImageNet, and the results are shown above. It is actually OK to use x[:197] for len(x)<197, since Python will automatically get the whole list. But I think the prune/merge results for the last layer might need to be fixed.
I tried to train for compression parameters on ImageNet, and the results are shown above. It is actually OK to use
x[:197]
forlen(x)<197
, since Python will automatically get the whole list. But I think the prune/merge results for the last layer might need to be fixed.