Open 41xu opened 2 years ago
这部分笔记只看method部分(及之前的背景)实现细节和code待补充。
π-GAN leverages neural representations with periodic activation functions and volumetric rendering to represent scenes as view-consistent radiance fields.
3D-aware image synthesis learned neural scene representations from input 2D images to render view-consistent images from new camera poses.
core contribution: training a neural implicit GAN supervised by natural 2D data
most closely related work: sinusoidal representation network(SIREN) and NeRF
之前的几个结合3D information(3D-GAN, 3D representation)生成2D image的工作:Visual Object Networks, PrGANs, HoloGAN, BlockGA, lack the expressiveness needed to synthesis size high-fidelity pics.
Most similar: GRAF
$G_{\theta_G} (z, \xi)$ z: input noise, $\xi$: camera pose. 根据random noise和相机角度生成一个implicit radiance field然后用volume rendering 的方法生成一个2D image(和NeRF有点像)(而不是用HoloGAN的那种projection)
3D object的表示和NeRF里提到的一样,用$\vec x = (x,y,z), \vec d$ (coordinate, view direction) 过MLP得到neural radiane field: density, view-dependent color $\sigma (\vec x): R^3 \to R, \ \vec c (\vec x, \vec d) : R^5 \to R^3 (r,g,b)$。 中间的mapping network(inspired by StyleGAN) 将z 过一个FiLM map condition到SIREN
算了还是看图吧,这个图更清晰。和之前的NeRF相比,NeRF直接input x,d → MLP → density, color, NO random noise z(毕竟NeRF不是做GAN), NO other mapping network(而是直接用最简单的MLP)
$\phi_i: R^{M_i} \to R^{N_i}$是MLP里的一个layer
$\Phi(\vec x) = \phi{n-1} \circ \phi{n-2} \circ ... \circ \phi_0(\vec x), \ \phi_i(\vec x_i) = sin(\gamma_i \cdot (W_i \vec x_i + b_i) + \beta_i)$
Mapping Network input random noise z, output $\gamma, \beta$
接下来再过右上角的Linear得到density和color,然后就和NeRF里的volume rendering一样可以渲染出Ray color得到image了。这里面Linear加了bias,原文里也用了Wx+b的形式表示了$\sigma(\vec x), c(\vec x, \vec d)$
上面提到的SIREN block长这样:
Neural Rendering部分就和NeRF里的一样,这里不再提了。总的来说感觉这个工作做了一个GAN出来,生成color, density的时候加了SIREN模块(而不是像NeRF一样简单用MLP)然后在此过程里把random noise通过condition输入进去了。而且这个工作用的也是纯纯2D数据集+pinhole camera model and cast rays做的渲染。
D搞了一个progressively grows的结构(fo ProgressiveGAN的结构),从low resolution, high batch sizes开始训,逐渐增加resolution并且add new layers to D, 这样也能让开始的low resolution承受比较大的batch size帮助稳定训练,是一个好思路我之前没用过Orz。最后从32X32一直可以到128, 512。(但是G没有progressive grow)
train details和一些后续分析之后看完code再补。
project: https://marcoamonteiro.github.io/pi-GAN-website/ [CVPR2021] paper: https://arxiv.org/pdf/2012.00926.pdf code: https://github.com/marcoamonteiro/pi-GAN contrast work: HoloGAN, GRAF following work: (maybe ... a lot?) GRAM (to some extent?) architecture & results: