Could you please help with pointers or code to visualize the attention maps generated by mobilevit? While the regular ViT can be rescaled to image size and interpolated, I am finding it hard to figure out to generate visualization for attention map of size torch.Size([1, 4, 4, 16, 16]) for a single image. I understand 16x16 is the attention; and 4 at index 1 is head; but unable wrap my head around 4 at index2 and how it would contribute in plotting the visualization. This is a bit urgent and would really help if anyone has thoughts around this.
Hi @chinhsuanwu / All,
Could you please help with pointers or code to visualize the attention maps generated by mobilevit? While the regular ViT can be rescaled to image size and interpolated, I am finding it hard to figure out to generate visualization for attention map of size torch.Size([1, 4, 4, 16, 16]) for a single image. I understand 16x16 is the attention; and 4 at index 1 is head; but unable wrap my head around 4 at index2 and how it would contribute in plotting the visualization. This is a bit urgent and would really help if anyone has thoughts around this.
Regards, Arun