USTC-JialunPeng / Diverse-Structure-Inpainting

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"
MIT License
175 stars 19 forks source link
attention autoregressive-neural-networks generative-adversarial-networks inpainting multimodal tensorflow vq-vae

Diverse Structure Inpainting

Paper | Supplementary Material | ArXiv | BibTex

This repository is for the CVPR 2021 paper, "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE".

Introduction

(Top) Input incomplete image, where the missing region is depicted in gray. (Middle) Visualization of the generated diverse structures. (Bottom) Output images of our method.

Places2 Results

*Results on the Places2 validation set using the center-mask Places2 model.*

CelebA-HQ Results

*Results on one CelebA-HQ test image with different holes using the random-mask CelebA-HQ model.*

Installation

This code was tested with TensorFlow 1.12.0 (later versions may work, excluding 2.x), CUDA 9.0, Python 3.6 and Ubuntu 16.04

Clone this repository:

git clone https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.git

Datasets

Training

Testing

Pre-trained Models

Download the pre-trained models using the following links and put them under model_logs/ directory.

The center_mask models are trained with images of 256x256 resolution with center 128x128 holes. The random_mask models are trained with random regular and irregular holes.

Inference Time

One advantage of GAN-based and VAE-based methods is their fast inference speed. We measure that Mutual Encoder-Decoder with Feature Equalizations runs at 0.2 second per image on a single NVIDIA 1080 Ti GPU for images of resolution 256×256. In contrast, our model runs at 45 seconds per image. Naively sampling our autoregressive network is the major source of computational time. Fortunately, this time can be reduced by an order of magnitude using an incremental sampling technique which caches and reuses intermediate states of the network. Consider using this technique for faster inference.

Citing

If our method is useful for your research, please consider citing.

@inproceedings{peng2021generating,
  title={Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE},
  author={Peng, Jialun and Liu, Dong and Xu, Songcen and Li, Houqiang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={10775-10784},
  year={2021}
}