doublejy715 / Paper_review

Paper_review
1 stars 0 forks source link

A Survey on Generative Adversarial Networks : Variants, Applications, and Training #33

Open doublejy715 opened 2 years ago

doublejy715 commented 2 years ago

Abstract

Generative Adversarial Networks(GAN)은 데이터 생성에서 뛰어난 모습을 보이고 있다. 많은 영역에서 쓰이고 있지만 여전히 안정적인 학습에는 어려움이 따른다. 문제점으로는 Nash-equilibrium, internal covariate shift, mode collapse, vanishing gradient, evaluation metrics의 부족 등이 있다. 본 survey에서는 GAN학습을 안정하게 하기 위한 training solution을 몇 가지 소개한다.

  1. original GAN model과 여기에서 파생된 version들을 소개한다.
  2. 여러 domain에서 적용된 GAN을 소개한다.
  3. GAN training시 문제점들을 해결하기 위한 논문들을 소개한다.

Keyworkds - GAN, Variants, Applications, Training

1. Introduction

Generative Adversarial Networks(GAN)

Generative Adversarial Networks(GAN)은 최근 unsupervised learning 방식으로 학습하여 뛰어난 결과를 내는 generatvie model로 언급된다. GAN은 대부분 semi-supervised 과 unsupervised learning 으로 학습한다. GAN takes a supervised learning approach to do unsupervised learning by generating fake or synthetic looking data.

GAN은 generator network(G)과 discriminator network(D)를 동시에 학습한다. D는 input data가 real / fake를 판단하는 binary classifier를 학습한다. 반대로 G는 D가 실제 데이터처럼 착각하도록 데이터를 생성하도록 학습한다.

GAN의 이용

GAN은 많은 영역에서 이용된다.hand-written font generation, anime characters generation, image blending, image in-painting, face agiing, text synthesis, human pose synthesis, stenographic applications, image manipulation applications, visual saliency prediction, object detection, 3D image synthesis, medical applications, facial makeup transfer, facial landmark dectection, image super-resolution, texture synthesis, sketch synthesis, image-to-image translation, face frontal view generation, language and speech syntehsis, music generation, video applications

GAN가 가진 문제들

GAN을 통해서 매우 실제적인 sample을 얻을 수 있지만, 여전히 GAN을 안정적으로 학습하는 것은 어려운 일이다. 그 이유는 G와 D는 alternating gradient descent(Alt-GD)이나 simultaneous gradient descent(Sim-GD)를 통해서 최적화 되기 때문이다.
GAN의 구조는 shortcomings, Nash-equilibrium, internal covariate shift, mode collapse, vanishing gradient, and lack of proper evaluation metrics 문제를 가지고 있다.

이 문제들을 해결하기 위해서 feature matching, unrolled GAN, mini-batch discrimination, historical averaging, two time-scale update rule, hybridmodel, self-attention GAN, relativistic GAN, label-smoothing, sampling GAN, proper optimizer, normalization techniques, add noise to inputs, using labels, alternative loss fuctions, gradient penalty, and cycle-consistency loss 방법들이 소개된다.

1.1 Structure of this survey

Table 1

# Subject Pub Year
1 Conditional Generative Adversarial Networks (CGAN) arXiv 2014
2 Deep Convolution Generative Adversarial Networks (DCGAN) CoRR 2015
3 Laplacian Generative Adversarial Networks (LapGAN) NIPS 2015
4 Information Maximizing Generative Adversarial Networks (InfoGAN) NIPS 2016
5 Energy-Based Generative Adversarial Networks (EBGAN) ICLR 2016
6 Wasserstein Generative Adversarial Networks (WGAN) ICML 2017
7 Boundary Equilibrium Generative Adversarial Networks (BEGAN) CSLG 2017
8 Progressive-Growing Generative Adversarial Networks (PGGAN) ICLR 2018
9 Big Generative Adversarial Networks (BigGAN) ICLR 2019
10 Style-Based Generator Architecture for Generative Adversarial Networks IEEE 2019

Table 3

Dom. Subject model names
Image Handwritten font generation DenseNet-CycleGAN / LS-CGAN / GlyphGAN
Image Anime characters generation DRA-GAN / PS-GAN
Image Image blending GP-GAN / GCC-GAN
Image Image in-painting Ex-GAN / PG-GAN
Image Face aging Age-CGAN / CAAE / IP-CGAN / Wavelet-GANs
Image Text synthesis AttnGAN / StackGAN / ACGAN / TACGAN / SISGAN
Image Human pose synthesis PG / Deformable-GAN
Image Stenographic applications S-GAN / SS-GAN / Stegano-GAN
Image Image manipulation IGAN / TAGAN / IAN / AttGAN / DGAN
Image Visual saliency prediction Sal-GAN / SalCapsule-CGAN / DSAL-GAN
Image Object detection SeGAN / PGAN / MTGAN / GANDO
Image 3D image synthesis 3DGAN / PrGAN
Image Medical applications SeGAN / MedGAN
Image Facial makeup transfer BGAN / PairedCycleGAN / DMTGAN
Image Face landmark detection SAN / Expose-GAN
Image Image super-resolution SR-GAN / ESR-GAN / SR-DGAN / T-GAN
Image Texture synthesis MGAN / SGAN /PSGAN
Image Sketch synthesis TGAN / Sketchy-GAN / CA-GAN
Image Image-to-image translation Pix2pix / PAN/ CycleGAN/ Disco-GAN / Dual-GAN / StarGAN / UNIT /MUNIT /DRIT / DRIT++
Image Face frontal view generation DRGAN / TPGAN / FFGAN /FTGAN
Audio Speech and audio synthesis Rank-GAN / VAW-GAN
Audio Music generation C-RNNGAN / Seq-GAN / ORGAN
video Video applications VGAN / MoCoGAN / DRNET / DVD-GAN
doublejy715 commented 2 years ago

2. Background

Generative models(GM)은 computer vision에 있어서 많은 발전을 가져다 주었다. Generative model은 training data ~ image를 가지고 unsupervised learning을 실시한다. 학습된 model은 training data와 비슷한 생성 분포를 가지게 되고, 새로운 sample data를 생성할 수 있게 된다.

왜 generative model 인가?

2.1 Autoencoder(AE)

수식 :

simple architecture AE

2.2 Variational Autoencoder(VAE)

AE vs VAE