castorini / daam

Diffusion attentive attribution maps for interpreting Stable Diffusion.
MIT License
669 stars 61 forks source link

Clarification needed #32

Closed devnkong closed 1 year ago

devnkong commented 1 year ago

Thanks for your great work! I wanna know why we need this operation below? We see that we only need half of the attn maps, for example if we have 8 heads then below for map_.size(0) we will have 16. But why do we have 16 in the first place considering we only have 8 heads each transformer block? Can you show me where does diffusers do this? Really confused, thank you!

https://github.com/castorini/daam/blob/119d8ff1dd4e61ef579824f3112fb0010eb2fff0/daam/trace.py#L215

daemon commented 1 year ago

Ah yes, so the first half of the heads in the earlier layers operates on the unconditional latent embeddings (of classifier-free guidance), initialized here: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L639. Since we care about the text-conditional embeddings only, we throw away those nuisance attention heads. You can verify that this procedure is sensible by visualizing the unconditional heads, e.g., map_ = map_[:map_.size(0) // 2].

devnkong commented 1 year ago

Thank you so much!