Issues with Gaussian Opacity Field (GOF) Applied to Limited Viewpoint Images Resulting in Suboptimal Gaussians and Meshes

autonomousvision / gaussian-opacity-fields

Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes

https://niujinshuchong.github.io/gaussian-opacity-fields/

Other

559 stars 26 forks source link

Issues with Gaussian Opacity Field (GOF) Applied to Limited Viewpoint Images Resulting in Suboptimal Gaussians and Meshes #57

Closed JiajieLi7012 closed 4 days ago

JiajieLi7012 commented 2 weeks ago

Hello,

I've been working on applying the Gaussian Opacity Field (GOF) approach to a scenario involving images with limited viewpoints. However, I encountered some problems where the trained Gaussians had more floaters compared to those obtained using the original 3DGS method. Do you have any insights on this observation? Additionally, the meshes extracted using GOF were not good.

Contrastingly, using the original 3DGS-based SuGaR method, both the Gaussians and the meshes yielded significantly better results. This raises a concern about whether GOF is best suited for scenarios that involve 360-degree image views. Could the issues I'm facing be related to the limited viewpoints of my images?

I would greatly appreciate any guidance or suggestions on how to optimize GOF for scenarios with limited viewpoints.

Thank you!

niujinshuchong commented 2 weeks ago

Hi, what do you mean by limited view points? Could you also share a comparison with SuGaR?

JiajieLi7012 commented 2 weeks ago

Thank you for your timely response. "Limited Viewpoint Images" means that the input images are extracted from a video, which implies that they only observe the scene from a limited number of angles, rather than including images from nearly 360 degrees of different perspectives as in a synthetic dataset. As for the Gaussians and mesh results of SuGaR, they are shown in the following images. Note the full video includes hand and red bucket movement but the results of GOF shown before were obtained only using images before the red bucket movement with the hand masked out while the results of SuGaR were trained directly on the full video without any masking (That's why there seems to be two red buckets). But directly training GOF on the full video also resulted in extremely similar results shown before.

Screenshot 2024-06-11 at 11 43 29 PM

niujinshuchong commented 2 weeks ago

Hi, the result of SuGaR is indeed better. Can you share some of the rendered images during training in the log_images files. That might be helpful to understand the differences.

JiajieLi7012 commented 2 weeks ago

Sure. Below are the rendered images at iteration 100, 10000, 20000, and 30000 respectively. 100

10000

20000

30000

niujinshuchong commented 2 weeks ago

Hi, thanks for sharing the results. The RGB rendering looks good. But the depth and normal are noisy. It might make sense if the input images have limited baselines. I think you could try to use some monocular prior to help the optimizatioin of geometry like our MonoSDF project https://github.com/autonomousvision/monosdf.

JiajieLi7012 commented 2 weeks ago

Thanks for sharing your insights. Just to make sure we are on the same page, do you mean limited views in the input images by "input images have limited baselines"?

niujinshuchong commented 2 weeks ago

Is your input images have little motion?

JiajieLi7012 commented 2 weeks ago

Yes.