chinhsuanwu / mobilevit-pytorch

A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer"
https://arxiv.org/abs/2110.02178
MIT License
501 stars 70 forks source link

Could someone help with pointers/code to visualize the attention maps generated by mobilevit? #15

Open arunsubk opened 1 year ago

arunsubk commented 1 year ago

Hi @chinhsuanwu / All,

Could you please help with pointers or code to visualize the attention maps generated by mobilevit? While the regular ViT can be rescaled to image size and interpolated, I am finding it hard to figure out to generate visualization for attention map of size torch.Size([1, 4, 4, 16, 16]) for a single image. I understand 16x16 is the attention; and 4 at index 1 is head; but unable wrap my head around 4 at index2 and how it would contribute in plotting the visualization. This is a bit urgent and would really help if anyone has thoughts around this.

Regards, Arun