boschresearch / Divide-and-Bind

Official implementation of "Divide & Bind Your Attention for Improved Generative Semantic Nursing" (BMVC 2023 Oral)
https://sites.google.com/view/divide-and-bind
GNU Affero General Public License v3.0
34 stars 5 forks source link

Attention Visualization #6

Closed FlynnSpace closed 11 months ago

FlynnSpace commented 11 months ago

Hi, thanks for sharing the jupyter code, I have some questions about the Attention Visualization part: When I debug that part, I notice the shape of "attn_map" is (16,256,77), so what does the number 16 means? And I see it slice the dimension to load_attn_dict[t]['down_cross'][4][8:], I want to know why choose the last 8 dimensions?

Look forward to your reply and thank you for taking the time to answer my questions!

YumengLi007 commented 11 months ago

Hi @FlynnSpace , the slicing is due to the classifier guidance, see https://github.com/boschresearch/Divide-and-Bind/blob/961b9ab29326b1dde2608f4355695e4ce06e4d3a/divide_and_bind/pipeline_divide_and_bind_latest.py#L1098 The first 8 attention maps are using the empty prompt and the last 8 attention maps are produced with the input prompt. So in total there are 16 attention maps.

FlynnSpace commented 11 months ago

Thanks, that helps a lot!