HelenMao/MAG-Edit - Githubissues

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance (ACM MM2024)

This repository is the official implementation of MAG-Edit.

Qi Mao, [Lan Chen](), Yuchao Gu, [Zhen Fang](), Mike Zheng Shou

[]()

(a) Blended latent diffusion (b) DiffEdit (c) Prompt2Prompt
(d) Plug-and-play (e) P2P+Blend (f) PnP+Blend

:bookmark: Abstract

TL; DR: MAG-Edit is the first method specifically designed to address localized image editing in complex scenarios without training.

CLICK for the full abstract

Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.

:pencil: Changelog

2024.05.24 Release Token Ratio Code!
2023.12.19 Release Project Page and Paper!
💡TODO:
[ ] Release Spatial Ratio Code
[x] Release Token Ratio Code
[x] Release MAG-Edit paper and project page
:video_game: MAG-Edit Implementation

Setup Environment

Our method is tested using cuda12.0 on a single A100 or V100. The preparation work mainly includes downloading the pre-trained model and configuring the environment.

conda create -n mag python=3.8
conda activate mag

pip install -r requirements.txt

We use Stable Diffusion v1-4 as backbone, please download from Hugging Face and change the file path in line26 in code_tr/network.py.

Run MAG-Edit (Token Ratio)

To run MAG-Edit, single GPU with at least 32 GB VRAM is required. The code_tr/edit.sh provide the edit sample.

CUDA_VISIBLE_DEVICES=0 python edit.py --source_prompt="there is a set of sofas on the red carpet in the living room"\
                --target_prompt="there is a set of sofas on the yellow carpet in the living room" \
                --target_word="yellow" \
                --img_path="examples/1/1.jpg"\
                --mask_path="examples/1/mask.png"\
                --result_dir="result"\
                --max_iteration=15\
                --scale=2.5

The result is saved at code_tr/result.

Various Editing Types

Other Applications

Qualitative Comparison

Comparison with training-free methods

Simplified Prompt	Source Image	Ours	Blended LD	DiffEdit	P2P	PnP
Green pillow
Denim pants
White bird
Slices of steak

Comparison with training and finetuning methods

Simplified Prompt	Source Image	Ours	Instruct -Pix2Pix	Magic -Brush	SINE
Yellow car
Plaid Sofa
Tropical fish
Straw -berry

Comparison with Inversion methods

Simplified Prompt	Source Image	Ours	Style -Diffusion	ProxNPI	DirectInversion
Jeep
Floral sofa
Yellow shirt

## :triangular_flag_on_post: Citation ``` @article{mao2023magedit, title={MAG-Edit: Localized Image Editing in Complex Scenarios via $\underline{M}$ask-Based $\underline{A}$ttention-Adjusted $\underline{G}$uidance}, author={Qi Mao and Lan Chen and Yuchao Gu and Zhen Fang and Mike Zheng Shou}, year={2023}, journal={arXiv preprint arXiv:2312.11396}, } ``` ## :revolving_hearts: Acknowledgements This repository borrows heavily from [prompt-to-prompt](https://github.com/google/prompt-to-prompt/) and [layout-guidance](https://github.com/silent-chen/layout-guidance). Thanks to the authors for sharing their code and models.