Triplane Concatenation and Module Groups

Hello Hansheng,

thank you very much for this clean codebase, great work!

If I am not mistaken, the denoising UNet is the typical DDPM architecture but expecting concatenated triplanes instead of images. Geometrically, this concatenation and the resulting kernel sharing within the convolutional layers is not intuitive in my opinion. Do you see what I mean or should I elaborate on this?

In the code, I have seen that you have also overridden all mmgen modules (MultiHeadAttention, DenoisingResBlock etc.) in order to make them grouped operations. It seems like you have also tried to denoise the planes individually. If this is the case, I am very curious about the results, how they compare with denoising the triplanes jointly, and your interpretation of them :)

Again, thanks for your efforts. Best regards Chris

Lakonik / SSDNeRF

Triplane Concatenation and Module Groups #21