Closed wenqsun closed 8 months ago
@wenqsun Hi, since we are interpreting the output features as Gaussians, and the number of Gaussians is determined by the output resolution, it maybe problematic if we are using latent space to compress the spatial dimensions. It may require different designs to make it work.
Thanks for your explanation. I think your concern makes sense, and I will try some designs.
Thanks for your great work!
I noticed that in your work, the image input is pixel-level (consistent with the prior work: splatter image). I am wondering if we can consider using the VAE to encode the image into the latent code and train the unet in the latent space. If not, what may be the concern or drawback?
Thanks for your reply!