Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?

Abstract

1. Introduction

given image into the extended latent space W+ of a pre-trained StyleGAN
study multiple questions providing insight into the structure of the StyleGAN latent space. As a result, we can better understand the latent space and how different classes of images are embedded.
propose to use three basic operations on vectors to study the quality of the embedding.

2. Related Work

2.1. High-quality GANs

GAN
DCGAN(Deep Convolution GAN)
ProGAN : generate realistic human faces at a high resolution(1024 x 1024)
BigGAN : training large batch size. smooth interpolations spanning different classes

problem

the lack of control over image modifiction ascribed to the interpretability of neural network.

2.2. Latent Space Embedding

2.3. Perceptual Loss and Style Transfer

3. What images can be embedded into the StyleGAN latent space

3.1. Embedding Results for Various Image Classes

Q . 사람 얼굴이 아닌 다른 dataset에도 latent vector로 embedding이 효과적인지 확인하고자 하였다.

human face, cat, dog, picture, car데이터를 모아서 StyleGAN의 latent vector에 embedding 시켰다. (car 데이터는 face가 없는 데이터로 선정)

image to latent vector to image을 시도하였다. input image와 output image는 비슷한 결과물로 관찰됐지만 디테일한 정보는 손실되었다.

This reveals the effective embedding capability of the algorithm and the generality of the learned filters of the generator.

+Q . pre-trained latent space의 quality가 embedding 하는데 얼마나 영향을 미치는가?

3.2. How Robust is the Embedding of Face Images?

Affine Transformation

StyleGAN embedding은 affine transformation에 매우 민감하다. figure 2에서 original image (a)는 image2vector2image가 잘 되었으나, 변환된 사진 (b) ~ (g)는 잘 되지 않은 것을 볼 수 있다.detail하게 표현하지 못하고, blurry 현상이 발생하였다.

이로서 Affine transformation은 GAN의 generator에 많은 영향을 준다.

Embedding Defective Image

defective image를 StyleGAN의 latent space에 embedding 시켜보았다.
FIgure 3. 얼굴 attribute 일부를 삭제하고 generate 시켰으나 다른 attribute 지점에는 영향을 미치지 않았다.

이러한 결과는 face editing에 적절하며, latent space는 얼굴 전체에 대해서 영향을 미치지 않는 점을 관찰하였다.

3.3 Which Latent Space to Choose?

StyleGAN에는 2개의 latent space(W, Z)가 존재한다. 512-dimensional vector z 는 full connected neural network를 지나 512-dimensional vector w가 된다. (동일한 18개의 512-dimensional vector를 생성한다.)

해당 논문에서는 extended latent space W+로 embedding을 목표로 한다. 여기서 W+는 18개의 서로 다른 512-dimensional w vector를 의미한다. 이 벡터들은 18개의 StyleGAN architecture의 AdaIn input으로 들어가게 된다.

(추가 예정)

4. How Meaningful is the Embedding?

위 관찰은 image editing(morphing, expression transfer, and style transfer)의 가능성과 generate high quality images를 고려하게 해준다.

4.1. Morphing

Morphing : 주어진 2개의 이미지의 latent vector를 가지고 linear interpolation시키는 것이다.

사람 to 사람(row 1,2,3)의 경우 자연스럽게 변하는 모습을 관찰 할 수 있다.
동물 to 동물, 그림 to 차(row 4,5)의 경우 부자연스럽게 변하였다.

위의 실험으로 StyleGAN의 latent space structure은 human face로 구성됨을 관찰하였다. face morphing에서는 뛰어난 결과를 나타낸다.

4.2 Style Transfer

Style Transfer : 주어진 latent code w{1}, w{2}의 값 일부를 교환하여 Image generate 한다. (crossover operation)

좌측의 painting style을 얼굴 사진에 적용한 모습
좌측의 painting style을 얼굴, 그림, 차에 적용한 모습

Figure 8의 경우 처음 9 layer에는 first row의 latent vector를 넣어주고, 나머지 9 layer에는 first column의 latent vector를 넣어주었다. non-face image의 경우 style transfer는 되지 않았다.

이로서 StyleGAN의 이미지 style 표현 능력은 higher spatial resolution layer에 분포되어 있음을 보인다.

4.3. Expression Transfer and Face Reenactment

Expression Transfer : A이미지의 표정을 B 이미지의 표정으로 바꿔준다.

expression transfer를 위해서 3개의 latent vector가 필요하다.

w_{1} : target image의 latent vector
w_{2} : source image의 expression vector
w{3} : source image의 뚜렷한 expression vector w = w{1} + r(w{3} - w{2}) 의 수식을 이용하여 latent vector를 조정하였다. 이러한 방법은 높은 퀄리티의 expression transfer를 가능하게 하였다.

5. Embedding Algorithm

( 추가 예정)

6. Conclusion

proposed an efficient algorithm to embed a given image into the latent space of StyleGAN. This algorithm enables semantic image editing operations, such as image morphing, style transfer, and expression transfer.

meaning

Important conclusions of our work are that embedding works best into the extended latent space W+ and that any type of image can be embedded.

Limitations

However, only the embedding of faces is semantically meaningful.

Inherit image artifacts present in pre-trained StyleGAN that we illustrate in supplementary materials.
The optimization takes several minutes and an embedding algorithm that can work in under a second would be more appealing for interactive editing.

doublejy715 / Paper_review