Lakonik / SSDNeRF

[ICCV 2023] Single-Stage Diffusion NeRF
https://lakonik.github.io/ssdnerf/
MIT License
432 stars 23 forks source link

Triplane Concatenation and Module Groups #21

Open Chrixtar opened 1 year ago

Chrixtar commented 1 year ago

Hello Hansheng,

thank you very much for this clean codebase, great work!

If I am not mistaken, the denoising UNet is the typical DDPM architecture but expecting concatenated triplanes instead of images. Geometrically, this concatenation and the resulting kernel sharing within the convolutional layers is not intuitive in my opinion. Do you see what I mean or should I elaborate on this?

In the code, I have seen that you have also overridden all mmgen modules (MultiHeadAttention, DenoisingResBlock etc.) in order to make them grouped operations. It seems like you have also tried to denoise the planes individually. If this is the case, I am very curious about the results, how they compare with denoising the triplanes jointly, and your interpretation of them :)

Again, thanks for your efforts. Best regards Chris

Lakonik commented 1 year ago

Hi Chris, thanks for your interest in our work!

We did try grouped operations in some early experiments, but to no avail. Currently we settled for either stacked triplanes (concatenating the channel dimension) or tiled triplanes (a.k.a. rollout, i.e., concatenating the spatial dimension).