[CVPR2023] Masked and Adaptive Transformer for Exemplar Based Image Translation (MATEBIT)

利用掩码自适应注意力机制，构建可靠的无监督、跨模态语义匹配关系，进而用于样例引导式图像翻译，提升内容图像与样例图像不同区域间的匹配关系；
利用质量-风格联合对比学习，学习高质量的风格表征，用于全局风格调制；
在油画、国画、虚拟试衣等任务中，显著提升了生成质量。

Abstract

We present a novel framework for exemplar based image translation. Recent advanced methods for this task mainly focus on establishing cross-domain semantic correspondence, which sequentially dominates image generation in the manner of local style control. Unfortunately, cross-domain semantic matching is challenging; and matching errors ultimately degrade the quality of generated images. To overcome this challenge, we improve the accuracy of matching on the one hand, and diminish the role of matching in image generation on the other hand. To achieve the former, we propose a masked and adaptive transformer (MAT) for learning accurate cross-domain correspondence, and executing context-aware feature augmentation. To achieve the latter, we use source features of the input and global style codes of the exemplar, as supplementary information, for decoding an image. Besides, we devise a novel contrastive style learning method, for acquire quality-discriminative style representations, which in turn benefit high-quality image generation.

Paper Information

Chang Jiang, **Fei Gao*, Biao Ma, Yuhao Lin, Nannan Wang, Gang Xu, "Masked and Adaptive Transformer for Exemplar Based Image Translation*," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition* (CVPR), 2023, pp. 22418-22427.

Sample Results

same results:

localFace2

More Results:

We offer more results here: Google Drive

Prerequisites

Linux or macOS
Python 3.8
Pytorch 1.8
CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Preparation

Clone this repo:

git clone https://github.com/AiArt-HDU/MATEBIT
cd MATEBIT

VGG model for computing loss. Download from here, move it to models/
for the preparation of datasets.please refer to CocosNet

Pretrained Models

The pre-trained model need to be save at ./checkpoint

Train/Test

1) Celeba(edge-to-face)

Dataset Download from here.
Retrieval_pairs same as Celebahq (edge-to-face)
Train_Val split same as Celebahq (edge-to-face)

Run the following command. Note that dataset_path is your celebahq root, e.g. /data/Dataset/CelebAMask-HQ.

python train.py --name celebahqedge --dataset_mode celebahqedge --PONO --PONO_C --amp --batchSize 4 --netG dynast --load_size 286 --crop_size 256 --dataroot root_path --contrastive_weight 100.0 --label_nc 15 --niter 30 --niter_decay 30 --gpu_ids 0 --use_atten --vgg_normal_correct --style_weight 0.1  --weight_warp_self 1000.0 --weight_perceptual 0.001 --vgg_path vgg/vgg19_conv.pth --continue_train

python test.py --name celebahqedge --dataset_mode celebahqedge --PONO --PONO_C --amp --batchSize 4 --netG dynast --load_size 256 --crop_size 256 --dataroot root_path --no_flip --which_epoch latest --save_per_img

2) DeepFashion (pose-to-image)

Dataset Download DeepFashion, we use OpenPose to estimate pose of DeepFashion. Download and unzip openpose results, then move folder pose/ to DeepFashion/
Retrieval_pairs Download deepfashion_ref.txt, deepfashion_ref_test.txt and deepfashion_self_pair.txt from here, save or replace them in data/

Train_Val split Download train.txt and val.txt from here, save them in DeepFashion/

python train.py --PONO --PONO_C --no_flip --video_like --vgg_normal_correct  --video_like  --nThreads 40 --amp --display_winsize 256 --load_size 286  --crop_size 256  --label_nc 3 --batchSize 80  --gpu_ids 0,1,2,3,4,5,6,7 --netG dynast --niter 100 --niter_decay 100 --vgg_path vgg/vgg19_conv.pth --n_layers 3 --use_atten --contrastive_weight 100.0 --style_weight 0.2 --weight_perceptual 0.01 --continue_train --display_freq 5000

python test.py --PONO --PONO_C --no_flip --video_like --vgg_normal_correct  --video_like  --nThreads 16 --amp --display_winsize 256 --load_size 286  --crop_size 256  --label_nc 3 --batchSize 4 --which_epoch latest --save_per_img

3) Other datasets

Download Metfaces AAHQ Ukiyo-e faces
Brush painting, traditional art painting

We obtained a dataset of 915 traditional Chinese brush paintings with a resolution of 512 size from the Internet. Google Drive

retrieve similar reference image, After making the labels, you can train the rest of the dataset.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{jiang2023masked,
  title={Masked and Adaptive Transformer for Exemplar Based Image Translation},
  author={Jiang, Chang and Gao, Fei and Ma, Biao and Lin, Yuhao and Wang, Nannan and Xu, Gang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={22418--22427},
  year={2023}
}

Acknowledgments

This code borrows heavily from DynaST and MMTN. We also thank the implementation of Synchronized Batch Normalization.

AiArt-Gao / MATEBIT

readme