Add support for InstructPix2Pix Fixes #40

The original DAAM divides the attention maps into 2 chunks, one corresponding to text conditions and the other unconditional. It then uses text conditioning to calculate cross-attention maps.

InstructPix2Pix has 3 chunks, one corresponding to text conditions, one for image conditions and the other for unconditional.

The modified DAAM checks if the size of map_ is 24 instead of the original 16. If true, it divides the maps into 3 chunks instead of 2 allowing the attention maps to be generated correctly.

Meeting Whiteboard (1)

Old attention maps(All maps have the same output: old

New attention maps:	Image1	Image2	Image3

castorini / daam

Add support for InstructPix2Pix Fixes #40 #60