Official PyTorch implement of paper "Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model"
[π Project Page] [ποΈ Dataset] [π₯ Video] [π Arxiv] [π€ Hugging Face Demo]
We are open to any suggestions and discussions and feel free to contact us through liruizhao@stu.xmu.edu.cn.
git clone https://github.com/OpenGVLab/Diffree.git
cd Diffree
2. Install package
conda create -n diffree python=3.8.5
conda activate diffree
pip install -r requirements.txt
## Inference
1. Download the Diffree model from Huggingface.
pip install huggingface_hub
huggingface-cli download LiruiZhao/Diffree --local-dir ./checkpoints
2. You can inference with the script:
python app.py
Specifically, `--resolution` defines the maximum size for both the resized input image and output image. For our <a href="https://huggingface.co/spaces/LiruiZhao/Diffree">Hugging Face Demo</a>, we set the `--resolution` to `512` to enhance the user experience with higher-resolution results. While during the training process of Diffree, `--resolution` is set to `256`. Therefore, reducing `--resolution` might improve results (e.g., consider trying `320` as a potential value).
## Data Download
You can download the OABench here, which are used for training the Diffree.
1. Download the OABench dataset from Huggingface.
huggingface-cli download --repo-type dataset LiruiZhao/OABench --local-dir ./dataset --local-dir-use-symlinks False
2. Find and extract all compressed files in the dataset directory
cd dataset
ls *.tar.gz | xargs -n1 tar xvf
The data structure should be like:
|-- dataset |-- original_images |-- 58134.jpg |-- 235791.jpg |-- ... |-- inpainted_images |-- 58134 |-- 634757.jpg |-- 634761.jpg |-- ... |-- 235791 |-- ... |-- mask_images |-- 58134 |-- 634757.png |-- 634761.png |-- ... |-- 235791 |-- ... |-- annotations.json
In the `inpainted_images` and `mask_images` directories, the top-level folders correspond to the original images, and the contents of each folder are the inpainted images and masks for those images.
## Training
Diffree is trained by fine-tuning from an initial StableDiffusion checkpoint.
1. Download a Stable Diffusion checkpoint and move it to the `checkpoints` directory. For our trained models, we used [the v1.5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt) as the starting point. You can also use the following command:
curl -L https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.ckpt -o checkpoints/v1-5-pruned-emaonly.ckpt
2. Next, you can start training.
python main.py --name diffree --base config/train.yaml --train --gpus 0,1,2,3
All configurations are stored in the YAML file. If you need to use custom configuration settings, you can modify the `--base` to point to your custom config file.
## Citation
If you found this work useful, please consider citing:
@article{zhao2024diffree, title={Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model}, author={Zhao, Lirui and Yang, Tianshuo and Shao, Wenqi and Zhang, Yuxin and Qiao, Yu and Luo, Ping and Zhang, Kaipeng and Ji, Rongrong}, journal={arXiv preprint arXiv:2407.16982}, year={2024} }