Relating to the recent paper about 'Self-guidance' method

bloc97 / CrossAttentionControl

Unofficial implementation of "Prompt-to-Prompt Image Editing with Cross Attention Control" with Stable Diffusion

MIT License

1.27k stars 91 forks source link

Hello @bloc97,

Your work has been instrumental in my understanding of the topic, especially since I encountered some difficulties when trying to run the official prompt to prompt code.

Recently, I've been engrossed in a paper titled "Diffusion Self-Guidance for Controllable Image Generation" (https://dave.ml/selfguidance/), where the authors introduce a novel 'Self Guidance' method. This technique edits an image by manipulating the attention maps, and I notice its resemblance to the 'Prompt to Prompt' method.

As an undergraduate student eager to delve deeper into the realm of Computer Vision, I'm interested in implementing this 'Self Guidance' method for my project. However, as of now, the authors have not released their official code. Hence I'm considering implementing that self guidance method upon the foundation of your code.

Given your expertise in this area, I was wondering if you think it's feasible to implement the 'Self Guidance' method based on your code? Any insights or suggestions you could provide would be immensely appreciated.

Hi, after skimming through the paper I think it shouldn't be too difficult to implement the guidance operators described in the paper. If I understood it correctly, they apply transformations to the 1D intensity space f(x,y) and 2D coordinates space (x,y) -> (x',y') of the attention maps (eg. scaling, translation, etc.). In this repo, the attention maps are already exposed by the code.

For example, here it slices and replaces some attention maps by others in def new_sliced_attention:

if self.last_attn_slice_mask is not None:
    new_attn_slice = torch.index_select(self.last_attn_slice, -1, self.last_attn_slice_indices)
    attn_slice = attn_slice * (1 - self.last_attn_slice_mask) + new_attn_slice * self.last_attn_slice_mask

You could probably select a specific attn_slice that corresponds to an object and scale it to make the object bigger (the exact implementation details should follow the paper for best results).

bloc97 / CrossAttentionControl

Relating to the recent paper about 'Self-guidance' method #29