Paper | Supplementary Material | ArXiv | BibTex
This repository is for the CVPR 2021 paper, "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE".
(Top) Input incomplete image, where the missing region is depicted in gray. (Middle) Visualization of the generated diverse structures. (Bottom) Output images of our method.
This code was tested with TensorFlow 1.12.0 (later versions may work, excluding 2.x), CUDA 9.0, Python 3.6 and Ubuntu 16.04
Clone this repository:
git clone https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.git
checkpoints_dir
, dataset
, train_flist
and valid_flist
arguments in train_vqvae.py
, train_structure_generator.py
and train_texture_generator.py
.data/data_loader.py
according to the dataset. For CelebA-HQ, we resize each image to 266x266 and randomly crop a 256x256. For Places2 and ImageNet, we randomly crop a 256x256python train_vqvae.py
to train VQ-VAE.vqvae_network_dir
argument in train_structure_generator.py
and train_texture_generator.py
based on the path of pre-trained VQ-VAE.train_structure_generator.py
and train_texture_generator.py
to choose center mask or random mask.python train_structure_generator.py
to train the structure generator.python train_texture_generator.py
to train the texture generator.structure_generator_dir
and texture_generator_dir
arguments in save_full_model.py
based on the paths of pre-trained structure generator and texture generator.python save_full_model.py
to save the whole model.checkpoints_dir
, dataset
, img_flist
and mask_flist
arguments in test.py
.model.ckpt.meta
, model.ckpt.index
, model.ckpt.data-00000-of-00001
and checkpoint
under model_logs/
directory.python test.py
Download the pre-trained models using the following links and put them under model_logs/
directory.
center_mask model
: CelebA-HQ_center | Places2_center | ImageNet_centerrandom_mask model
: CelebA-HQ_random | Places2_random | ImageNet_randomThe center_mask models are trained with images of 256x256 resolution with center 128x128 holes. The random_mask models are trained with random regular and irregular holes.
One advantage of GAN-based and VAE-based methods is their fast inference speed. We measure that Mutual Encoder-Decoder with Feature Equalizations runs at 0.2 second per image on a single NVIDIA 1080 Ti GPU for images of resolution 256×256. In contrast, our model runs at 45 seconds per image. Naively sampling our autoregressive network is the major source of computational time. Fortunately, this time can be reduced by an order of magnitude using an incremental sampling technique which caches and reuses intermediate states of the network. Consider using this technique for faster inference.
If our method is useful for your research, please consider citing.
@inproceedings{peng2021generating,
title={Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE},
author={Peng, Jialun and Liu, Dong and Xu, Songcen and Li, Houqiang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={10775-10784},
year={2021}
}