Attention map from vision part

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Apache License 2.0

3.28k stars 203 forks source link

Attention map from vision part #511

Open Rizwan2613 opened 2 weeks ago

Rizwan2613 commented 2 weeks ago

Hey,

is there a way to get attention map for images? I want to find areas of image which plays role for particular output. I have tried qwen2-vl-detection on huggingface to find bounding boxes (this is what I need) but it is not good for documents.

Any help will be appreciated.