πππ Welcome to the Segment Any RGBD GitHub repository! πππ
πππ New! We release technical report! πππ [arxiv]
π€π€π€ Segment AnyRGBD is a toolbox to segment rendered depth images based on SAM! Don't forget to star this repo if you find it interesting!
Input to SAM (RGB or Rendered Depth Image) | SAM Masks with Class and Semantic Masks | 3D Visualization for SAM Masks with Class and Semantic Masks |
---|---|---|
We find that humans can naturally identify objects from the visulization of the depth map, so we first map the depth map ([H, W]) to the RGB space ([H, W, 3]) by a colormap function, and then feed the rendered depth image into SAM. Compared to the RGB image, the rendered depth image ignores the texture information and focuses on the geometry information. The input images to SAM are all RGB images in SAM-based projects like SSA, Anything-3D, and SAM 3D. We are the first to use SAM to extract the geometry information directly. The following figures show that depth maps with different colormap functions has different SAM results.
In this repo, we provide two alternatives for the users, including feeding the RGB images or rendered depth images to the SAM. In each mode, the user could obtain the semantic masks (one color refers to one class) and the SAM masks with the class. The overall structure is shown in the following figure. We use OVSeg for zero-shot semantic segmentation.
Input to SAM (RGB or Rendered Depth Image) | SAM Masks with Class and Semantic Masks | 3D Visualization for SAM Masks with Class and Semantic Masks |
---|---|---|
Input to SAM (RGB or Rendered Depth Image) | SAM Masks with Class and Semantic Masks | 3D Visualization for SAM Masks with Class and Semantic Masks |
---|---|---|
Please see installation guide.
We provide the UI (ui.py
) and example inputs (/UI/
) to reproduce the above demos. We use the OVSeg checkpoints ovseg_swinbase_vitL14_ft_mpt.pth for zero-shot semantic segmentation, and SAM checkpoints sam_vit_h_4b8939.pth. Put them under this repo. Simply try our UI on your own computer:
python ui.py
Simply click one of the Examples at the bottom and the input examples will be automatically fill in. Then simply click 'Send' to generate and visualize the results. The inference takes around 2 and 3 minutes for ScanNet and SAIL-VOS 3D respectively.
Please download SAIL-VOS 3D and ScanNet to try more demos.
This repo is developed based on OVSeg which is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the MIT license; MaskFormer is licensed under the CC-BY-NC; openclip is licensed under the license at its repo; SAM is licensed under the Apache License.