HelenMao / MAG-Edit

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance (ACM MM2024)
https://mag-edit.github.io/
87 stars 1 forks source link

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance (ACM MM2024)

This repository is the official implementation of MAG-Edit.

Qi Mao, [Lan Chen](), Yuchao Gu, [Zhen Fang](), Mike Zheng Shou

Project Website [arXiv]()


(a) Blended latent diffusion (b) DiffEdit (c) Prompt2Prompt
(d) Plug-and-play (e) P2P+Blend (f) PnP+Blend

:bookmark: Abstract

TL; DR: MAG-Edit is the first method specifically designed to address localized image editing in complex scenarios without training.

CLICK for the full abstract Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.

:pencil: Changelog

conda create -n mag python=3.8
conda activate mag

pip install -r requirements.txt

We use Stable Diffusion v1-4 as backbone, please download from Hugging Face and change the file path in line26 in code_tr/network.py.

Run MAG-Edit (Token Ratio)

To run MAG-Edit, single GPU with at least 32 GB VRAM is required. The code_tr/edit.sh provide the edit sample.

CUDA_VISIBLE_DEVICES=0 python edit.py --source_prompt="there is a set of sofas on the red carpet in the living room"\
                --target_prompt="there is a set of sofas on the yellow carpet in the living room" \
                --target_word="yellow" \
                --img_path="examples/1/1.jpg"\
                --mask_path="examples/1/mask.png"\
                --result_dir="result"\
                --max_iteration=15\
                --scale=2.5

The result is saved at code_tr/result.

Various Editing Types

Other Applications


Qualitative Comparison

Comparison with training-free methods

Simplified
Prompt
Source
Image
Ours Blended LD DiffEdit P2P PnP
Green
pillow
Denim
pants
White
bird
Slices of
steak
Comparison with training and finetuning methods

Simplified
Prompt
Source
Image
Ours Instruct
-Pix2Pix
Magic
-Brush
SINE
Yellow
car
Plaid
Sofa
Tropical
fish
Straw
-berry
Comparison with Inversion methods

Simplified
Prompt
Source
Image
Ours Style
-Diffusion
ProxNPI DirectInversion
Jeep
Floral
sofa
Yellow
shirt
## :triangular_flag_on_post: Citation ``` @article{mao2023magedit, title={MAG-Edit: Localized Image Editing in Complex Scenarios via $\underline{M}$ask-Based $\underline{A}$ttention-Adjusted $\underline{G}$uidance}, author={Qi Mao and Lan Chen and Yuchao Gu and Zhen Fang and Mike Zheng Shou}, year={2023}, journal={arXiv preprint arXiv:2312.11396}, } ``` ## :revolving_hearts: Acknowledgements This repository borrows heavily from [prompt-to-prompt](https://github.com/google/prompt-to-prompt/) and [layout-guidance](https://github.com/silent-chen/layout-guidance). Thanks to the authors for sharing their code and models.