TencentARC / MasaCtrl

[ICCV 2023] Consistent Image Synthesis and Editing
https://ljzycmd.github.io/projects/MasaCtrl/
Apache License 2.0
741 stars 28 forks source link

Reproducing editing results #41

Open momopusheen opened 7 months ago

momopusheen commented 7 months ago

Hi,

Thanks for your brilliant work!

I used the vanilla implementation provided in playground_real.ipynb for image editing, without making any modifications, but got unexpected results.

For example, when using the target prompt "a photo of a pixar superhero in NYC," the structure of the edited image did not align well with the original image.

I've installed the correct version of diffusers (0.15.0). The base model is SD v1.4.

Have you encountered similar cases on your end? Any insights or suggestions you could provide would be greatly appreciated. Thanks for your support in advance.

Here are my editing results: Target prompt: "a photo of a pixar superhero in NYC"

image

Target prompt: "a photo of a bronze horse in a museum"

image
ljzycmd commented 7 months ago

Hi @momopusheen, thanks for your attention. Note that MasaCtrl is designed for non-rigid editing that tries to maintain content consistency after editing. If you want to keep the layout unchanged after editing (change the global style or local object), you can use the following attention editor:

class MutualSelfAttentionStyle(AttentionBase):
    """
    Change the style of the orginal image
    """
    def __init__(self, end_step=25):
        super().__init__()
        self.end_step = end_step

    def forward(self, q, k, v, sim, attn, is_cross, place_in_unet, num_heads, **kwargs):
        if not is_cross:
            if self.cur_step < self.end_step:
                attn_u_ref, attn_u_cur, attn_c_ref, attn_c_cur = attn.chunk(4)
                attn = torch.cat([attn_u_ref, attn_u_ref, attn_c_ref, attn_c_ref], dim=0)

        return super().forward(q, k, v, sim, attn, is_cross, place_in_unet, num_heads, **kwargs)

In this editor, the self-attention maps are used to maintain the layout of the edited image, thus achieving style editing and local object editing.

Hope the above can help you.