OceanYing / OmniSeg3D-GS

3D Gaussian Splatting adapted version of OmniSeg3D (CVPR2024)
MIT License
49 stars 2 forks source link

Possible to export the data? #2

Open abrahamezzeddine opened 3 months ago

abrahamezzeddine commented 3 months ago

Hello,

I am wondering if it is possible to export the segmentation as point wise for each splats that is part of a semantic segmentation for the whole splat model?

For example, an apple is a set of splats and if SAM determines it is a semantic object, then that means those splats/points are the apple, regardless of what view I am at.

Is that possible to export/append, or at least export the segmentation result for the whole Gaussian splat?

thank you!

OceanYing commented 3 months ago

Good suggestion!

I guess that's exactly what I have implemented in our GUI. You can find more details and instructions in the this section in README.md.

It is highly recommended to train the full model and then use our GUI to segment out any interested object as an individual .ply file, which can be further visualized in any GS viewer.

abrahamezzeddine commented 3 months ago

Thank you.

I am particularly interested in segmenting the complete Gaussian splat without user intervention. As I understand, as of now it only segments what is visible in the current viewport?

Imagine if every point XYZ would have their semantic ID based on the mask ”permanently” after training. So whatever viewer we open the splat with, the semantic information is already appended into the splat file.

Is that something that is possible you think?

This would be like a zero shot semantic segmentation of the complete point cloud/gaussian splat.

OceanYing commented 3 months ago

Sure, your understanding is right. The GUI provided here aims at segmenting objects by selection.

In our OmniSeg3D, each XYZ point has a permanent semantic ID, which is a 16-dim float vector instead of a pre-defined class label. The cosine similarity between two XYZ points reveals the semantic correlation between them. After training, you can use a global algorithm (like HDBSCAN, etc.) to segment the whole Gaussian splat to different clusters according to the feature similarity. This procedure has no restriction on 2D viewpoints. However, since there is no definition of "Graininess", the direct clustering with a global threshold may not provide satisfactory segmentation for all the interested objects. For example, let's say there are two chairs (each with four legs) in a room. After training, we can use HDBSCAN to cluster the splat with a pre-defined threshold, it is possible to see that one chair is segmented as a whole object, while the other chair is segmented into several parts. You can see this phenomenon in Figure 8 and the discussion in subsection "Automatic Discretization" in the supp material of our paper.

Since the capability of a segmentation algorithm (such as the "zero shot semantic segmentation" you have mentioned) still depends on human's purpose that could usually be expressed with language (e.g. I want to segment out all the chair legs), I think one possible solution is to involve language-related information as an extra regularization.