This is the official repository for the following paper:
Painterly Image Harmonization using Diffusion Model [arXiv]
Lingxiao Lu, Jiangtong Li, Junyan Cao, Li Niu, Liqing Zhang
Accepted by ACM MM 2023.
Our PHDiffusion is the first diffusion-based painterly image harmonization method, which can significantly outperform GAN-based methods when the background has dense textures or abstract style.
Our PHDiffusion has been integrated into our image composition toolbox libcom https://github.com/bcmi/libcom. Welcome to visit and try \(^▽^)/
In simple cases, our GAN-based PHDNet is sufficiently effective and much more efficient.
Sometimes setting the background style as the target style is not reasonable, this problem has been solved in our ArtoPIH.
Dependencies
Run
pip install -r requirements.txt
Download Models
Please download the following models to the pretrained_models/
folder.
Our pretrained model. You can download one of them for testing. The main difference between the two models is that PHDiffusionWithoutRes removes the residual structure in its adapter, while PHDiffusionWithRes retains it. Note that PHDiffusionWithoutRes perform better in some dense texture styles, learning textures that are more similar to the original ones. While PHDiffusionWithRes can preserve better content. You can make selections based on your needs.
Training Data
Data Acquisition
We have two benchmark datasets: COCO and WikiArt.
These datasets are used to create composite images by combining photographic foreground objects from COCO with painterly backgrounds from WikiArt.
Data Processing
During training, we use instance annotation to extract the foreground objects from the foreground images and place them onto randomly chosen painterly backgrounds from the background images, resulting in 37,931 composite images in each epoch. Finally, all composite images are resized to 512 × 512 for training. This process can produce composite images with conflicting visual elements.
Train
You can run this to train adapter and dual encoder fusion module:
CUDA_VISIBLE_DEVICES="0,1" python -m torch.distributed.launch --nproc_per_node 2 train.py
Test
You can run this to test using adapter with residual:
CUDA_VISIBLE_DEVICES="0" python -m torch.distributed.launch --nproc_per_node=1 test.py --strength 0.7 --model_resume_path pretrained_models/PHDiffusionWithRes.pth
And run this to test using adapter without residual:
CUDA_VISIBLE_DEVICES="0" python -m torch.distributed.launch --nproc_per_node=1 test.py --strength 0.7 --model_resume_path pretrained_models/PHDiffusionWithoutRes.pth --no_residual
Our method can significantly outperform GAN-based methods when the background has dense textures or abstract style.