Open abrahamezzeddine opened 3 months ago
I think what you have implemented is cool. From my understanding, you can associate the masks of one object across multiple views by leveraging the visibility information from COLMAP. I think this is quite similar to the idea of SAM3D but they have denser point cloud as input.
As a comparison, OmniSeg3D can also associate object masks from multiple views (even the object is incomplete in some views), and that's why we can achieve multi-view consistent segmentation i.e. 3D segmentation. After training, you can rendered multi-view semantic feature maps, on which if you choose one object (actually its feature), you can get the same object mask on the other images.
Besides, since the output of SAM may not be consistent in multi-view images, explicit association may be ambiguous. Instead, OmniSeg3D implicitly associate them in a feature field (in 3D space) using a hierarchical contrastive learning strategy, which respects all the input segmentation masks and results in a multi-view consistent hierarchical feature field for interactive segmentation.
Hello,
I have created a script that will utilize the COLMAP data structure and via the sparse data, determine which SAM masks belongs to what frames.
For example, a trash bin that is viewed from different view points in a room, the code will be able to group together those masks together, or a tree from different views, or whatever object, even if the object comes out of frame. As long as COLMAP sees it, my masking program will connect them together spatially.
Is that something that would be useful for you? Is that what your code does at the moment?