Abstract

본 논문에서는 StyleGAN2의 latent style space을 다룬다.
StyleGAN2는 image generator 중 최신 모델이며, 다양한 dataset을 가지고 사전학습된 모델들을 사용한다.
먼저 StyleSpace를 소개할 것이다. StyleSpace(the space of channel-wise style parameter)는 이전의 작업물에 비해서 더 disentangled한 latent space를 가지고 있다.
다음으로 다양한 style channel을 찾는 방법에 대해서 소개할 것이다. 이 방법을 이용하면 visual attribute를 실제적이고 disentangled하게 수정할 수 있다.
세번째로, identifying style channel을 위한 간단한 방법을 소개한다. 이는 미리 학습된 분류기나 간단한 시험 이미지를 이용한다.
StyleSpace에서 attribute의 manipulation 조절은 다른 이전의 논문들 보다 disentangled한 모습을 보인다. 이것은 Attibute Dependency metric을 이용하여 보일 것이다.
본 논문에서는, real image의 manipulation을 위해서 StyleSpace의 적용 가능성을 증명할 것이다.

1. Introduction

paragraph 1

control을 위한 model의 중요한 요소는 interpretable과 disentangled 해야 한다.

GAN은 실제적인 사진을 만드는데 많이 이용되는 방법이다. 이러한 image generator의 성공은 "무엇이 model의 기능을 저하시키는가"의 이해와 어떤 방벙으로 model을 조절하느냐가 관건이 된다.
특히 control을 위한 중요한 요소는 interpretable, disentangled 해야 한다.

paragraph 2

과거 disentangled한 latent space를 만든 사례

DCGAN이나 Progressive 같은 GAN architecture들은 simple distribution에서 random하게 추출된 latent vector를 시작시 이용한다는 공통점을 가지고 있다. 그리고 실제적인 이미지를 convolutional layer를 이용하여 만들어 낸다.
최근에는 style-based architecture가 점점 유명해졌다. latent vector가 mapping network를 통해 중간 단계의 latent code가 된다. 그리고 이 코드는 generator의 channel 범위 activation 통계를 이용한 convolution layer를 거치는 과정 중에 image를 modify 하는데 이용된다.
BigGAN은 class-conditional BatchNorm을, StyleGAN은 channel-wise means and variance를 조정하기 위해서 AdaIN을 이용하였다.
StyleGAN2는 convolution kernel의 weight을 이용하여 channel wise variance를 조절한다. 이건 중간 단계의 latent space(mappted latent space,W)가 더 disentangled 하도록 해준다.
추가적으로 StyleGAN의 latent space는 Progressive GAN보다 disentangled한 모습을 보인다.

paragraph 3

현재 control attribute 방법과 한계점

결과를 조절할 수 있는 몇몇 generater는 conditioning을 가질 것이다. annotated data를 가진 모델의 학습이 요구되는 것이 그러하다.
반대로 style-based design은 generator를 학습한 뒤에 interpretable의 다양성을 찾는다. 그러나 최근에는 pretrained classifier(한쌍의 많은 샘플이 있는 경우) 이나, 많은 control direction 방향을 테스트 하기도 한다. 이러한 방법들은 접근 방법에 한계가 있다.
더욱이 하나의 attribute를 조정하고 싶어도 보통 entangled 때문에 다른 attribute에도 영향을 미친다.

paragraph 4

본 논문의 목적은 style-based generator architectures에서 degree disentanglement를 이해하는 것이다.

본 논문의 목적은 style-based generator architectures에서 degree disentanglement를 이해하는 것이다.
더 중요한 질문은 어떻게 disentangled control을 하는가? -> unsupervised 방식으로 하거나, 작은 양의 supervision을 이용하는 여러 방법을 본 논문에서 소개한다.

paragraph 5

본 논문에서 보이는 S latent space는 W/W+ latent space 보다 더 disentangled 하다.

최근 disentangled representation 논문은 완벽한 disentangled를 위해서 latent representation을 고려한다. 만약 latent dimension은 하나의 attribute를 control 하고(disentanglement), 각 attribute는 하나의 dimension이 control 가능한 경우를 의미한다(completeness).
StyleGAN2의 W/W+ latent space를 조사한다. 그리고 S latent space(channel-wise style parameters)로 보내는 StyleSpace를 테스트 한다.
Section3에서는 disentanglement 하고 completeness한 space에서 S latent space를 적용한 모습을 조사한다.

paragraph 6

Section 4 간단 소개

Section 4에서는 StyleSpace channel을 찾는 간단한 방법을 소개한다. StyleSpace는 이미지의 일부 semantic regions를 조절하는데 쓰인다.
style parameter를 따라 generated image하는 gradient map을 계산함으로서 우리는 이러한 채널들이 특별한 semantic region에만 영향을 미친다는 것을 식별한다. 3개의 다른 datasets(FFHQ, LSUN Bedroom, LSUN Car)을 이용해서 효과적인 방법을 증명한다.

paragraph 7

간단한 방법론 소개

다음 목적은 style channel을 원하는 attribute로 조정하는 것이다. 이를 위해서 attribute가 있는 exemplar images 세트가 필요하다.
기본 idea는 style vector의 평균을 population 평균과 비교하는 것이다.

2. Related Work

Understanding the latent representations of pretrained generators

3. Disentanglement of StyleGAN latent spaces

paragraph 1

StyleSpace(S)의 소개

StyleGAN/StyleGAN2는 latent space에서 숫자를 뽑아와 generation을 진행한다. 첫번째 latent space(Z)는 normally distribute한 분포를 가지고 있다. Random noise vectors z는 fully connected layer을 통해서 중간 latent space W로 변환된다.
W space는 더 나은 학습된 distribution의 disentangled nature로 본다. 각 w는 channel-wise style parameters s로 변환된다. generator의 각 레이어에 학습된 affine transformation을 이용해서 입력으로 넣어준다. 이를 본 논문에서는 StyleSpace(S)라 한다.
몇몇 논문에서는 다른 latent space이름으로 W+를 말한다. W+는 style mixing이나 image inversion에 이용된다.

paragraph 2

StyleGAN2의 latent vector 정보 및 S latent vector 정보

StyleGAN2는 채널 당 하나의 style parameter가 존재하고, 이 style parameter는 convolution kernel weight를 modulating하여 feature map을 조정할 수 있다.
style parameter는 tRGB block에 의하여 쓰여지고, tRGB block은 feature map을 각 해상도의 RGB image로 바꿔준다.
추가적으로 18layer로 구성된 1024x1024 StyleGAN2에서 W는 512-dimension, W+는 9216-dimention(512*18), S는 9088-dimention을 가진다. 이 중 6048-dimensions는 feature map에 적용되고 3040- dimensions는 tRGB block에 적용된다.(이는 Appendix A에서 더 자세하게 설명되어 있다.)

paragraph 3

DCI이 하는 일, 학습 종류 소개

이러한 latent space들 중에서 어떤 space가 가장 disentangled representation한지 찾는 것이 첫번째 목표이다.
이후 DCI(disentanglement / completeness / informativeness) metrics를 이용하여 다른 dimension과 latent representation을 비교 가능하게 해준다. DCI metrics는 latent vector들을 대응되는 attribute vector로 바꿔 주는 것을 학습한다.
Disentanglement 측정은 각 latent dimension당 하나의 attribute에 관련되는지 정도를 측정하고, completeness는 각 attribute가 하나의 latent dimension에 의하여 잘 조정>되는지 측정한다. 마지막으로 informativeness 측정은 attribute의 classification 정확도를 살핀다.

paragraph 4

synthetically generated dataset을 이용하여 disentanglement의 정도를 조사하지 않고 우리는 StyleGAN2를 실제 dataset을 가지고 학습시켰다.(FFHQ)
DCI regressor를 학습시키기 위한 training data를 만들기 위해서 논문에서는 CelebA attribute를 대상으로 pretrained된 40개의 이진 분류기를 적용하였다.
이 classifier는 얼굴의 특성을 찾기 위해서 학습되었다. hair, smiling, lipstick, 등등의 모습을 이진 분류 형태로 나타난다.

paragraph 5

Data 준비

첫번째로는 latent space z에서 500K 만큼 랜덤하게 sampling하고 그것들의 대응되는 w와 s vectors를 기록한다.(물론 이미지 형태로) 또한 각 이미지는 40개의 classifier 주석을 가진다.(logit형태)
모든 attribute가 생성된 이미지에 대해서 잘 보여지는건 아니다. 그래서 31개 attributes에서 최상위 5%, 최하위 5% 값을 가진 이미지들을 선정했다. 또한 각 attribute에서 확실하지 않게 분류한것 최상위 2%, 최하위 2%를 제거하였다.

paragraph 6

DCI metrics를 계산하고 latent spaces Z,W,S를 비교한다.

왼쪽 표에서 W와 S 둘다 높은 값을 가진다. 특히 S score는 disentanglement와 completeness에서 높은 점수를 기록한다. 이는 각 S의 dimension은 하나의 attribute만 조종하기에 더 좋은 공간임을 나타낸다.

paragraph 7

w+/s의 비교

W+가 StyleGAN inversion에 이용되고 나서, 별도의 실험으로 W+와 S의 StyleGAN inversion을 비교 할 것이다.
W latent space에서 500K의 랜덤한 샘플을 뽑고, 랜덤한 w codes를 18개 이어 붙여 w+ vector를 만든다. 이걸 이미지로 결과를 만들면 다소 부자연스러운 모습을 얻을 수 있다. 그리고 원래 31개의 attribute를 뽑으려 했지만 25개의 attribute 밖에 관측되지 않는다.
table 2의 오른쪽을 보면 w+보다는 s가 더 높은 score를 기록하였다.

4. Detecting locally-active style channels

이 section에서는 StyleSpace channel들이 어떠한 방법으로 local semantic region을 찾는지 간단하게 설명한다.
우리들의 직관적인 접근법은 생성된 이미지의 gradient map을 시험하는 것으로서 ~~

이것은 figure 2에서 다른 2개의 채널에 각각 gradient map을 사용하는 모습을 보인다.

doublejy715 / Paper_review

StyleSpace Analysis : Disentangled Controls for StyleGAN Image Generation #17

Abstract

1. Introduction

paragraph 1

paragraph 2

paragraph 3

paragraph 4

paragraph 5

paragraph 6

paragraph 7

2. Related Work

Understanding the latent representations of pretrained generators

3. Disentanglement of StyleGAN latent spaces

paragraph 1

paragraph 2

paragraph 3

paragraph 4

paragraph 5

paragraph 6

paragraph 7

4. Detecting locally-active style channels