castorini / daam

Diffusion attentive attribution maps for interpreting Stable Diffusion.
MIT License
696 stars 64 forks source link

DAAM with mu #40

Closed andreemic closed 7 months ago

andreemic commented 1 year ago

Hey! Great job on this repo! Very clean documentation and a useful idea.

daemon commented 1 year ago

Hey, thanks. I may be wrong as I'm not too familiar with the InstructPix2Pix architecture, but I think focusing on the cross-attention heads between the key text embeddings and the usual latent embeddings could work. If the attention key vectors are instead a concatenation of text embeddings and, say, image embeddings, then you could look at cross attention restricted to the text dimensions/area. If the text and image embeddings are unseparable (e.g., multimodal fusion), then that would likely be outside of the scope of DAAM/cross-attention and require a separate set of techniques.

nityanandmathur commented 8 months ago

@andreemic Please let me know if you were able to generate cross-attention maps for IP2P or ControlNet.

I am trying to visualize cross-attention maps for Stable Diffusion image-to-image pipeline and facing same errors.

nityanandmathur commented 8 months ago

@daemon Opened a pull request which fixes this. Please have a look.

https://github.com/castorini/daam/pull/60