This repository contains the implementation of the ICLR2024 paper "PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code"
Keywords: Diffusion Model, Image Inversion, Image Editing
Xuan Ju12, Ailing Zeng2, Yuxuan Bian1, Shaoteng Liu1, Qiang Xu1
1The Chinese University of Hong Kong 2International Digital Economy Academy *Corresponding Author
Project Page | Arxiv | Readpaper | Benchmark | Code | Video |
📖 Table of Contents
Text-guided diffusion models revolutionize image generation and editing, offering exceptional realism and diversity. Specifically, in the context of diffusion-based editing, common practice begins with a source image and a target prompt for editing. It involves obtaining a noisy latent vector corresponding to the source image using the diffusion model, which is then supplied to separate source and target diffusion branches for editing. The accuracy of this inversion process significantly impacts the final editing outcome, influencing both essential content preservation of the source image and edit fidelity according to the target prompt.
Previous inversion techniques attempted to find a unified solution in both the source and target diffusion branches. However, theoretical and empirical analysis shows that, in fact, a disentangling of the two branches leads to a clear separation of the responsibility for essential content preservation and edit fidelity, thus leading to better results in both aspects. In this paper, we introduce a novel technique called “PnP Inversion,” which rectifies inversion deviations directly within the source diffusion branch using just three lines of code, while leaving the target diffusion branch unaltered. To systematically evaluate image editing performance, we present PIE-Bench, an editing benchmark featuring 700 images with diverse scenes and editing types, complemented by versatile annotations. Our evaluation metrics, with a focus on editability and structure/background preservation, demonstrate the superior edit performance and inference speed of PnP Inversion across eight editing methods compared to five inversion techniques.
This is important!!! Since different models have different python environmnet requirements (e.g. diffusers' version), we list the environmnet in the folder "environment", detailed as follows:
run_editing_p2p.py
, run_editing_blended_latent_diffusion.py
, run_editing_stylediffusion.py
, and run_editing_edit_friendly_p2p.py
run_editing_instructdiffusion.py
and run_editing_instructpix2pix.py
run_editing_masactrl.py
run_editing_pnp.py
run_editing_pix2pix_zero.py
run_editing_edict.py
For example, if you want to use the models in run_editing_p2p.py
, you need to install the environment as follows:
conda create -n p2p python=3.9 -y
conda activate p2p
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r environment/p2p_requirements.txt
You can download the benchmark PIE-Bench (Prompt-driven Image Editing Benchmark) here. The data structure should be like:
|-- data
|-- annotation_images
|-- 0_random_140
|-- 000000000000.jpg
|-- 000000000001.jpg
|-- ...
|-- 1_change_object_80
|-- 1_artificial
|-- 1_animal
|-- 111000000000.jpg
|-- 111000000001.jpg
|-- ...
|-- 2_human
|-- 3_indoor
|-- 4_outdoor
|-- 2_natural
|-- ...
|-- ...
|-- mapping_file_ti2i_benchmark.json # the mapping file of TI2I benchmark, contains editing text
|-- mapping_file.json # the mapping file of PIE-Bench, contains editing text, blended word, and mask annotation
PIE-Bench Benchmark:
TI2I Benchmark:
We also add TI2I benchmark in the data for ease of use. TI2I benchmark contains 55 images and edited image prompt for each image. The images are provided in data/annotation_images/ti2i_benchmark and the mapping file is provided in data/mapping_file_ti2i_benchmark.json.
Run the Benchmark
You can run the whole image editing results through run_editing_p2p.py
, run_editing_edit_friendly_p2p.py
, run_editing_masactrl.py
, run_editing_pnp.py
, run_editing_edict.py
, run_editing_pix2pix_zero.py
, run_editing_instructdiffusion.py
, run_editing_blended_latent_diffusion.py
,run_editing_stylediffusion.py
, and run_editing_instructpix2pix.py
. These python file contains models as follows (please unfold):
For example, if you want to run DirectInversion(Ours) + Prompt-to-Prompt, you can find this method has an index directinversion+p2p
in run_editing_p2p.py
. Then, you can run the editing type 0 with DirectInversion(Ours) + Prompt-to-Prompt through:
python run_editing_p2p.py --output_path output --edit_category_list 0 --edit_method_list directinversion+p2p
You can also run multiple editing methods and multi editing type with:
python run_editing_p2p.py --edit_category_list 0 1 2 3 4 5 6 7 8 9 --edit_method_list directinversion+p2p null-text+p2p
You can also specify --rerun_exist_images to choose whether rerun exist images. You can also specify --data_path and --output for image path and output path.
Run Any Image
You can process your own images and editing prompts to the same format as our given benchmark to run large number of images. You can also edit the given python file to your own image. We have given out the edited python file of run_editing_p2p.py
as run_editing_p2p_one_image.py
. You can run one image's editing through:
python -u run_editing_p2p_one_image.py --image_path scripts/example_cake.jpg --original_prompt "a round cake with orange frosting on a wooden plate" --editing_prompt "a square cake with orange frosting on a wooden plate" --blended_word "cake cake" --output_path "directinversion+p2p.jpg" "ddim+p2p.jpg" --edit_method_list "directinversion+p2p" "ddim+p2p"
We also provide jupyter notebook demo run_editing_p2p_one_image.ipynb
.
Noted that we use default parameters in our code. However, it is not optimal for all images. You may ajust them based on your inputs.
You can run evaluation through:
python evaluation/evaluate.py --metrics "structure_distance" "psnr_unedit_part" "lpips_unedit_part" "mse_unedit_part" "ssim_unedit_part" "clip_similarity_source_image" "clip_similarity_target_image" "clip_similarity_target_image_edit_part" --result_path evaluation_result.csv --edit_category_list 0 1 2 3 4 5 6 7 8 9 --tgt_methods 1_ddim+p2p 1_directinversion+p2p
You can find the choice of tgt_methods in evaluation/evaluate.py
with the dict "all_tgt_image_folders".
All the results of editing are avaible for download at here. You can download them and put them with file structre as follows to reproduce all the results in our paper.
output
|-- ddim+p2p
|-- annotation_images
|-- ...
|-- directinversion+p2p
|-- annotation_images
|-- ...
...
If you want to evaluate the whole table's results shown in our paper, you can run:
python evaluation/evaluate.py --metrics "structure_distance" "psnr_unedit_part" "lpips_unedit_part" "mse_unedit_part" "ssim_unedit_part" "clip_similarity_source_image" "clip_similarity_target_image" "clip_similarity_target_image_edit_part" --result_path evaluation_result.csv --edit_category_list 0 1 2 3 4 5 6 7 8 9 --tgt_methods 1 --evaluate_whole_table
Then, all results in the table 1 will be output in evaluation_result.csv.
Compare PnP Inversion with other inversion techniques across various editing methods:
More results can be found in the main paper.
Performance enhancement of incorporating PnP Inversion into four diffusion-based
editing methods:
Visulization results of different inversion and editing techniques:
More results can be found in the main paper.
@article{ju2023direct,
title={PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code},
author={Ju, Xuan and Zeng, Ailing and Bian, Yuxuan and Liu, Shaoteng and Xu, Qiang},
journal={International Conference on Learning Representations ({ICLR})},
year={2024}
}
Our code is modified on the basis of prompt-to-prompt, StyleDiffusion, MasaCtrl, pix2pix-zero , Plug-and-Play, Edit Friendly DDPM Noise Space, Blended Latent Diffusion, Proximal Guidance, InstructPix2Pix, thanks to all the contributors!