facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.
Other
931 stars 67 forks source link

Is ToMe sensitive to the image resolution? #16

Closed numb3r3 closed 1 year ago

numb3r3 commented 1 year ago

Just curious, whether the patch merging approach is sensitive to the resolution of the input image.

dbolya commented 1 year ago

We found in the paper that ToMe works better for larger images than smaller ones (lower accuracy drop, more speed-up). Of course, since there are more tokens you have to increase the number of tokens reduced per layer.