Pytorch implementation of MasaCtrl: Tuning-free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, Yinqiang Zheng
masactrl/masactrl_processor.py
and run_synthesis_sdxl_processor.py
. You can integrate MasaCtrl into official Diffuser pipeline by register the attention processor. We propose MasaCtrl, a tuning-free method for non-rigid consistent image synthesis and editing. The key idea is to combine the contents
from the source image and the layout
synthesized from text prompt and additional controls into the desired synthesized or edited image, by querying semantically correlated features with Mutual Self-Attention Control.
MasaCtrl can perform prompt-based image synthesis and editing that changes the layout while maintaining contents of source image.
The target layout is synthesized directly from the target prompt.
Directly modifying the text prompts often cannot generate target layout of desired image, thus we further integrate our method into existing proposed controllable diffusion pipelines (like T2I-Adapter and ControlNet) to obtain stable synthesis and editing results.
The target layout controlled by additional guidance.
Our method also generalize well to other Stable-Diffusion-based models.
With dense consistent guidance, MasaCtrl enables video synthesis
We implement our method with diffusers code base with similar code structure to Prompt-to-Prompt. The code runs on Python 3.8.5 with Pytorch 1.11. Conda environment is highly recommended.
pip install -r requirements.txt
Stable Diffusion: We mainly conduct expriemnts on Stable Diffusion v1-4, while our method can generalize to other versions (like v1-5). You can download these checkpoints on their official repository and Hugging Face.
Personalized Models: You can download personlized models from CIVITAI or train your own customized models.
Notebook demos
To run the synthesis with MasaCtrl, single GPU with at least 16 GB VRAM is required.
The notebook playground.ipynb
and playground_real.ipynb
provide the synthesis and real editing samples, respectively.
Online demos
We provide with Gradio app. Note that you may copy the demo into your own space to use the GPU. Online Colab demo is also available.
Local Gradio demo
You can launch the provided Gradio demo locally with
CUDA_VISIBLE_DEVICES=0 python app.py
Install T2I-Adapter and prepare the checkpoints following their provided tutorial. Assuming it has been successfully installed and the root directory is T2I-Adapter
.
Thereafter copy the core masactrl
package and the inference code masactrl_w_adapter.py
to the root directory of T2I-Adapter
cp -r MasaCtrl/masactrl T2I-Adapter/
cp MasaCtrl/masactrl_w_adapter/masactrl_w_adapter.py T2I-Adapter/
[Updates] Or you can clone the repo MasaCtrl-w-T2I-Adapter directly to your local space.
Last, you can inference the images with following command (with sketch adapter)
python masactrl_w_adapter.py \
--which_cond sketch \
--cond_path_src SOURCE_CONDITION_PATH \
--cond_path CONDITION_PATH \
--cond_inp_type sketch \
--prompt_src "A bear walking in the forest" \
--prompt "A bear standing in the forest" \
--sd_ckpt models/sd-v1-4.ckpt \
--resize_short_edge 512 \
--cond_tau 1.0 \
--cond_weight 1.0 \
--n_samples 1 \
--adapter_ckpt models/t2iadapter_sketch_sd14v1.pth
NOTE: You can download the sketch examples here.
For real image, the DDIM inversion is performed to invert the image into the noise map, thus we add the inversion process into the original DDIM sampler. You should replace the original file T2I-Adapter/ldm/models/diffusion/ddim.py
with the exteneded version MasaCtrl/masactrl_w_adapter/ddim.py
to enable the inversion function. Then you can edit the real image with following command (with sketch adapter)
python masactrl_w_adapter.py \
--src_img_path SOURCE_IMAGE_PATH \
--cond_path CONDITION_PATH \
--cond_inp_type image \
--prompt_src "" \
--prompt "a photo of a man wearing black t-shirt, giving a thumbs up" \
--sd_ckpt models/sd-v1-4.ckpt \
--resize_short_edge 512 \
--cond_tau 1.0 \
--cond_weight 1.0 \
--n_samples 1 \
--which_cond sketch \
--adapter_ckpt models/t2iadapter_sketch_sd14v1.pth \
--outdir ./workdir/masactrl_w_adapter_inversion/black-shirt
NOTE: You can download the real image editing example here.
We thank the awesome research works Prompt-to-Prompt, T2I-Adapter.
@InProceedings{cao_2023_masactrl,
author = {Cao, Mingdeng and Wang, Xintao and Qi, Zhongang and Shan, Ying and Qie, Xiaohu and Zheng, Yinqiang},
title = {MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {22560-22570}
}
If you have any comments or questions, please open a new issue or feel free to contact Mingdeng Cao and Xintao Wang.