Closed FlynnSpace closed 11 months ago
Hi @FlynnSpace , the slicing is due to the classifier guidance, see https://github.com/boschresearch/Divide-and-Bind/blob/961b9ab29326b1dde2608f4355695e4ce06e4d3a/divide_and_bind/pipeline_divide_and_bind_latest.py#L1098 The first 8 attention maps are using the empty prompt and the last 8 attention maps are produced with the input prompt. So in total there are 16 attention maps.
Thanks, that helps a lot!
Hi, thanks for sharing the jupyter code, I have some questions about the Attention Visualization part: When I debug that part, I notice the shape of "attn_map" is
(16,256,77)
, so what does the number 16 means? And I see it slice the dimension toload_attn_dict[t]['down_cross'][4][8:]
, I want to know why choose the last 8 dimensions?Look forward to your reply and thank you for taking the time to answer my questions!