castorini / daam

Diffusion attentive attribution maps for interpreting Stable Diffusion.
MIT License
689 stars 63 forks source link

Add support for InstructPix2Pix Fixes #40 #60

Closed nityanandmathur closed 7 months ago

nityanandmathur commented 7 months ago

The original DAAM divides the attention maps into 2 chunks, one corresponding to text conditions and the other unconditional. It then uses text conditioning to calculate cross-attention maps.

InstructPix2Pix has 3 chunks, one corresponding to text conditions, one for image conditions and the other for unconditional.

The modified DAAM checks if the size of map_ is 24 instead of the original 16. If true, it divides the maps into 3 chunks instead of 2 allowing the attention maps to be generated correctly.

Meeting Whiteboard (1)

Old attention maps(All maps have the same output: old

New attention maps: Image1 Image2 Image3
image (6) image (7) image (8)