OpenGVLab / DiffRate

[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging techniques, while incorporating a differentiable compression rate.
78 stars 7 forks source link

The prune/merge token numbers for the last layer is out of range. #4

Closed Bostoncake closed 1 month ago

Bostoncake commented 5 months ago

image

I tried to train for compression parameters on ImageNet, and the results are shown above. It is actually OK to use x[:197] for len(x)<197, since Python will automatically get the whole list. But I think the prune/merge results for the last layer might need to be fixed.

ChenMnZ commented 4 months ago

This does not affect the correct results, because we do not execute token compression in the last block.

Actually, in the last layer, we can even compress all the other tokens except class token.