Wrong visualization results of ViT-B/16 in gradcam_clip.ipynb

clip-vil / CLIP-ViL

[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383

MIT License

401 stars 35 forks source link

Wrong visualization results of ViT-B/16 in gradcam_clip.ipynb #35

Open wangq95 opened 11 months ago

wangq95 commented 11 months ago

Hello, I got a wrong visualization result when I used the gradcam_clip.ipynb by replacing ViT-B/32 in default with "ViT-B/16". The cases are shown as below:

Why does the display result fail after switching the structure?