catherinesyeh / attention-viz

Visualizing query-key interactions in language + vision transformers
http://attentionviz.com/
MIT License
122 stars 15 forks source link

text to image view #70

Open 18445864529 opened 1 year ago

18445864529 commented 1 year ago

First thank you for the great work!

I would like to know whether this tool can also do text-to-image attention views for large vision-language models such as MiniGPT-4, LLaVA, InstructBLIP, etc.?

Thanks!

catherinesyeh commented 1 year ago

Thanks so much for checking out AttentionViz! We have not tried visualizing text-to-image attention yet but I think our tool/technique can feasibly be extended to vision-language models and this is definitely a great direction for future work.