OliverRensu / Shunted-Transformer

209 stars 20 forks source link

How can I produce the visualized image after ShuntendTransformer? #2

Open MingfangDeng opened 2 years ago

MingfangDeng commented 2 years ago

I want to see the result( a image)after ShuntendTransformer,but the type of output is tensor,and the channel is not 3,I don't know how to
acquire the image after ShuntendTransformer? I will appreciate if you can help me,thank you very much.

OliverRensu commented 2 years ago

Hi We follow these codes https://github.com/facebookresearch/dino/blob/main/visualize_attention.py https://github.com/hila-chefer/Transformer-Explainability

rayleizhu commented 2 years ago
  1. As far as I know, each heatmap is corresponding to a query point. so what's the query point for the visualization in Figure 3?
  2. Which level of attention block did you use for visualization?
go-ahead-maker commented 1 year ago

I find that in DINO, the default backbone is ViT or DeiT, which uses the CLS token to represent the whole image-level information. So the visualization result is corresponding to the CLS token. But Shuntend does not use CLS token, so It may be difficult to adopt the attention map to reflect the RoI of the whole image, and Grad-CAM may be the alternative.