41xu / some-reading

some papers and notes of interest
0 stars 0 forks source link

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis #1

Open 41xu opened 2 years ago

41xu commented 2 years ago

project: https://marcoamonteiro.github.io/pi-GAN-website/ [CVPR2021] paper: https://arxiv.org/pdf/2012.00926.pdf code: https://github.com/marcoamonteiro/pi-GAN contrast work: HoloGAN, GRAF following work: (maybe ... a lot?) GRAM (to some extent?) architecture & results:

截屏2022-02-01 下午10 35 37 截屏2022-02-01 下午10 36 48 截屏2022-02-01 下午10 37 49
41xu commented 2 years ago

Note of pi-GAN

这部分笔记只看method部分(及之前的背景)实现细节和code待补充。

Background

π-GAN leverages neural representations with periodic activation functions and volumetric rendering to represent scenes as view-consistent radiance fields.

3D-aware image synthesis learned neural scene representations from input 2D images to render view-consistent images from new camera poses.

core contribution: training a neural implicit GAN supervised by natural 2D data

most closely related work: sinusoidal representation network(SIREN) and NeRF

之前的几个结合3D information(3D-GAN, 3D representation)生成2D image的工作:Visual Object Networks, PrGANs, HoloGAN, BlockGA, lack the expressiveness needed to synthesis size high-fidelity pics.

Most similar: GRAF

Method

$G_{\theta_G} (z, \xi)$ z: input noise, $\xi$: camera pose. 根据random noise和相机角度生成一个implicit radiance field然后用volume rendering 的方法生成一个2D image(和NeRF有点像)(而不是用HoloGAN的那种projection)

SIREN-based implicit radiance field

3D object的表示和NeRF里提到的一样,用$\vec x = (x,y,z), \vec d$ (coordinate, view direction) 过MLP得到neural radiane field: density, view-dependent color $\sigma (\vec x): R^3 \to R, \ \vec c (\vec x, \vec d) : R^5 \to R^3 (r,g,b)$。 中间的mapping network(inspired by StyleGAN) 将z 过一个FiLM map condition到SIREN

算了还是看图吧,这个图更清晰。和之前的NeRF相比,NeRF直接input x,d → MLP → density, color, NO random noise z(毕竟NeRF不是做GAN), NO other mapping network(而是直接用最简单的MLP)

截屏2022-02-04 上午2 17 21

$\phi_i: R^{M_i} \to R^{N_i}$是MLP里的一个layer

$\Phi(\vec x) = \phi{n-1} \circ \phi{n-2} \circ ... \circ \phi_0(\vec x), \ \phi_i(\vec x_i) = sin(\gamma_i \cdot (W_i \vec x_i + b_i) + \beta_i)$

Mapping Network input random noise z, output $\gamma, \beta$

接下来再过右上角的Linear得到density和color,然后就和NeRF里的volume rendering一样可以渲染出Ray color得到image了。这里面Linear加了bias,原文里也用了Wx+b的形式表示了$\sigma(\vec x), c(\vec x, \vec d)$

上面提到的SIREN block长这样:

截屏2022-02-04 上午2 26 04

Neural Rendering

Neural Rendering部分就和NeRF里的一样,这里不再提了。总的来说感觉这个工作做了一个GAN出来,生成color, density的时候加了SIREN模块(而不是像NeRF一样简单用MLP)然后在此过程里把random noise通过condition输入进去了。而且这个工作用的也是纯纯2D数据集+pinhole camera model and cast rays做的渲染。

Discriminator

D搞了一个progressively grows的结构(fo ProgressiveGAN的结构),从low resolution, high batch sizes开始训,逐渐增加resolution并且add new layers to D, 这样也能让开始的low resolution承受比较大的batch size帮助稳定训练,是一个好思路我之前没用过Orz。最后从32X32一直可以到128, 512。(但是G没有progressive grow)

train details和一些后续分析之后看完code再补。