Image2image translation that convert an input image into a latent representation (style representation), then put into the generator of the pretrained styleGAN. Since the input is converted into a latent representation, it is possible to transform the image without being bound by the pixel information of the input image (e.g., changing the direction of the face to the front). In addition, this framework can be used to perform a wide range of tasks such as pix2pix (e.g., high-resolution, face orientation transformation).
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or
Penta-AI
Tel-Aviv University
Methods
The network structure is based on Feature Pyramid Networks, which are often used in object detection, to obtain a style representation.
They do not use adversarial learning, but train three types of learning objectives ; L2 distance per pixel, perceptual loss (e.b. F is pre-trained VGG), and identity loss using ArcFace (R in the following formula).
Results
is framework can be used to perform a wide range of tasks such as pix2pix (e.g., high-resolution, face orientation transformation).
TL;DR
Image2image translation that convert an input image into a latent representation (style representation), then put into the generator of the pretrained styleGAN. Since the input is converted into a latent representation, it is possible to transform the image without being bound by the pixel information of the input image (e.g., changing the direction of the face to the front). In addition, this framework can be used to perform a wide range of tasks such as pix2pix (e.g., high-resolution, face orientation transformation).
Why it matters:
Paper URL
https://arxiv.org/abs/2008.00951
Submission Dates(yyyy/mm/dd)
2020/08/03
Authors and institutions
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or
Methods
The network structure is based on Feature Pyramid Networks, which are often used in object detection, to obtain a style representation. They do not use adversarial learning, but train three types of learning objectives ; L2 distance per pixel, perceptual loss (e.b. F is pre-trained VGG), and identity loss using ArcFace (R in the following formula).
Results
is framework can be used to perform a wide range of tasks such as pix2pix (e.g., high-resolution, face orientation transformation).
Comments