Official Implement of PILOT.
Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Susstrunk, Mathieu Salzmann
It is recommended to create and use a Torch virtual environment, such as conda. Next, download the appropriate PyTorch version compatible with your CUDA devices, and install the required packages listed in requirements.txt.
git clone https://github.com/Lingzhi-Pan/PILOT.git
cd PILOT
conda create -n pilot python==3.9
conda activate pilot
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
You can download the stable-diffusion-v1-5
model from the website "https://huggingface.co/runwayml/stable-diffusion-v1-5" and save it to your local path.
We provide three types of conditions to guide the inpainting process: text, spatial controls, and reference images. Each condition-control refers to a different configuration in the directory configs/
.
Modify the model_path
parameter in the config file to point to the directory where you saved your SD model, and then execute the following instruction:
python run_example.py --config_file configs/t2i_step50.yaml
To introduce spatial controls using ControlNet or T2I-Adapter, we offer options for both models, but we recommend using ControlNet. First, download the ControlNet checkpoint, such as ControlNet conditioned on Scribble images, published by Lvmin Zhang from the following link: https://huggingface.co/lllyasviel/sd-controlnet-scribble. Then, execute the instructions below:
python run_example.py --config_file configs/controlnet_step30.yaml
You can also download other ControlNet models published by Lvmin Zhang to enable inpainting with other conditions such as canny map, segmentation map, and normal map.
Download the checkpoint of IP-Adapter from the website "https://huggingface.co/h94/IP-Adapter", and then run the following instruction:
python run_example.py --config_file configs/ipa_step50.yaml
You can also use ControlNet and IP-Adapter together to achieve multi-condition controls:
python run_example.py --config_file configs/ipa_controlnet_step30.yaml
You can also integrate LORA into the base model or replace the base model with other personalized Text-to-Image (T2I) models to achieve personalized image inpainting. For example, replacing the base model with a T2I model fine-tuned by DreamBooth using several photos of a cute dog can generate the dog inside the masked region while preserving the dog's identity effectively.
See our Paper for more information!
If you find this work helpful, please consider citing:
@article{pan2024coherent,
title={Coherent and Multi-modality Image Inpainting via Latent Space Optimization},
author={Pan, Lingzhi and Zhang, Tong and Chen, Bingyuan and Zhou, Qi and Ke, Wei and S{\"u}sstrunk, Sabine and Salzmann, Mathieu},
journal={arXiv preprint arXiv:2407.08019},
year={2024}
}