fraunhoferhhi / gaussian_gan_decoder

Official implementation for: https://arxiv.org/abs/2404.10625
https://florian-barthel.github.io/gaussian_decoder/index.html
43 stars 1 forks source link

About inference #5

Closed yangqinhui0423 closed 1 month ago

yangqinhui0423 commented 2 months ago

Hello Author, thank you for your wonderful work. After reading the paper, I would like to know your approximate training time and inference time of the Gaussian parameter decoder. Can you share it? Moreover, do we need to sample first for inference?

Florian-Barthel commented 2 months ago

For simplicity, we have trained the decoder on a single GPU, which takes about 8h for 100k iterations. During inference, it takes about 0.1 seconds to render a new ID. Here, the bottleneck is the position initialization, which is currently handled by sampling the density and then applying marching cubes. I am currently working on faster position sampling methods, so that higher framerates can be achieved during latent interpolation. Once the model is created, however, it achieves about 170 FPS. Does this answer your question? I'm not sure what you mean by sample first

yangqinhui0423 commented 2 months ago

For simplicity, we have trained the decoder on a single GPU, which takes about 8h for 100k iterations. During inference, it takes about 0.1 seconds to render a new ID. Here, the bottleneck is the position initialization, which is currently handled by sampling the density and then applying marching cubes. I am currently working on faster position sampling methods, so that higher framerates can be achieved during latent interpolation. Once the model is created, however, it achieves about 170 FPS. Does this answer your question? I'm not sure what you mean by sample first

Thank you for your answer. My confusion was about location initialization (Part 3.1 in the paper). My understanding of the whole process of position initialization is :MLP decodes the feature to get the opacity, use marching cubes to get a surface from the opacity, and then randomly samples and interpolates on the surface, which is the position initialization of the GS point cloud. I don't know if my understanding is correct? And if so, when inference we still need to repeat the process above to get the location information, isn't it? By the way, the decoder that decodes opacity here is pre-trained, so it should not be relevant to the decoder network in 3.2 (especially the one relates to opacity)?

Florian-Barthel commented 2 months ago

Yes, your explanation is correct. We have also tested some other position initialization, however, this methods showed the best results. During inference, this has to be done too. The idea, I am currently testing, is to train a small mapping network that warps the positions based on the input latent vector w. If this works, the inference speed should increase significantly.

Yes, the NeRF's decoder also delivers the density. This could be a good initialization for the gaussian splatting decoder part that decodes the opacity. Nevertheless, I believe, it still makes sense, to re-train the opacity decoder, as some Gaussian splats spread across large regions, while the NeRF decoder only describes the density of one specific coordinate in space.