PILOT: Coherent and Multi-modality Image Inpainting via Latent Space Optimization

Official Implement of PILOT.

Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Susstrunk, Mathieu Salzmann

Method Overview

Getting Started

It is recommended to create and use a Torch virtual environment, such as conda. Next, download the appropriate PyTorch version compatible with your CUDA devices, and install the required packages listed in requirements.txt.

git clone https://github.com/Lingzhi-Pan/PILOT.git
cd PILOT
conda create -n pilot python==3.9
conda activate pilot
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

You can download the stable-diffusion-v1-5 model from the website "https://huggingface.co/runwayml/stable-diffusion-v1-5" and save it to your local path.

Run Examples

We provide three types of conditions to guide the inpainting process: text, spatial controls, and reference images. Each condition-control refers to a different configuration in the directory configs/.

Text-guided

Modify the model_path parameter in the config file to point to the directory where you saved your SD model, and then execute the following instruction:

python run_example.py --config_file configs/t2i_step50.yaml

Text + Spatial Controls

To introduce spatial controls using ControlNet or T2I-Adapter, we offer options for both models, but we recommend using ControlNet. First, download the ControlNet checkpoint, such as ControlNet conditioned on Scribble images, published by Lvmin Zhang from the following link: https://huggingface.co/lllyasviel/sd-controlnet-scribble. Then, execute the instructions below:

python run_example.py --config_file configs/controlnet_step30.yaml

You can also download other ControlNet models published by Lvmin Zhang to enable inpainting with other conditions such as canny map, segmentation map, and normal map.

Text + Reference Image

Download the checkpoint of IP-Adapter from the website "https://huggingface.co/h94/IP-Adapter", and then run the following instruction:

python run_example.py --config_file configs/ipa_step50.yaml

Text + Spatial Controls + Reference Image

You can also use ControlNet and IP-Adapter together to achieve multi-condition controls:

python run_example.py --config_file configs/ipa_controlnet_step30.yaml

Personalized Image Inpainting

You can also integrate LORA into the base model or replace the base model with other personalized Text-to-Image (T2I) models to achieve personalized image inpainting. For example, replacing the base model with a T2I model fine-tuned by DreamBooth using several photos of a cute dog can generate the dog inside the masked region while preserving the dog's identity effectively.

See our Paper for more information!

BibTeX

If you find this work helpful, please consider citing:

@article{pan2024coherent,
  title={Coherent and Multi-modality Image Inpainting via Latent Space Optimization},
  author={Pan, Lingzhi and Zhang, Tong and Chen, Bingyuan and Zhou, Qi and Ke, Wei and S{\"u}sstrunk, Sabine and Salzmann, Mathieu},
  journal={arXiv preprint arXiv:2407.08019},
  year={2024}
}

Lingzhi-Pan / PILOT

readme