TL;DR

Image2image translation that convert an input image into a latent representation (style representation), then put into the generator of the pretrained styleGAN. Since the input is converted into a latent representation, it is possible to transform the image without being bound by the pixel information of the input image (e.g., changing the direction of the face to the front). In addition, this framework can be used to perform a wide range of tasks such as pix2pix (e.g., high-resolution, face orientation transformation).

Why it matters:

Paper URL

https://arxiv.org/abs/2008.00951

Submission Dates(yyyy/mm/dd)

2020/08/03

Authors and institutions

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or

Penta-AI
Tel-Aviv University

Methods

The network structure is based on Feature Pyramid Networks, which are often used in object detection, to obtain a style representation. They do not use adversarial learning, but train three types of learning objectives ; L2 distance per pixel, perceptual loss (e.b. F is pre-trained VGG), and identity loss using ArcFace (R in the following formula).