Jun-CEN / SegmentAnyRGBD

Segment Any RGBD
Other
773 stars 45 forks source link

SAD: Segment Any RGBD

πŸŽ‰πŸŽ‰πŸŽ‰ Welcome to the Segment Any RGBD GitHub repository! πŸŽ‰πŸŽ‰πŸŽ‰

πŸš€πŸš€πŸš€ New! We release technical report! πŸš€πŸš€πŸš€ [arxiv]


πŸ€—πŸ€—πŸ€— Segment AnyRGBD is a toolbox to segment rendered depth images based on SAM! Don't forget to star this repo if you find it interesting!
Hugging Face Spaces Hugging Face Spaces

Input to SAM (RGB or Rendered Depth Image) SAM Masks with Class and Semantic Masks 3D Visualization for SAM Masks with Class and Semantic Masks

πŸ₯³ Introduction

We find that humans can naturally identify objects from the visulization of the depth map, so we first map the depth map ([H, W]) to the RGB space ([H, W, 3]) by a colormap function, and then feed the rendered depth image into SAM. Compared to the RGB image, the rendered depth image ignores the texture information and focuses on the geometry information. The input images to SAM are all RGB images in SAM-based projects like SSA, Anything-3D, and SAM 3D. We are the first to use SAM to extract the geometry information directly. The following figures show that depth maps with different colormap functions has different SAM results.

😎 Method

In this repo, we provide two alternatives for the users, including feeding the RGB images or rendered depth images to the SAM. In each mode, the user could obtain the semantic masks (one color refers to one class) and the SAM masks with the class. The overall structure is shown in the following figure. We use OVSeg for zero-shot semantic segmentation.

🀩 Comparison

πŸ”₯ Demos

Sailvos3D Dataset

Input to SAM (RGB or Rendered Depth Image) SAM Masks with Class and Semantic Masks 3D Visualization for SAM Masks with Class and Semantic Masks

ScannetV2 Dataset

Input to SAM (RGB or Rendered Depth Image) SAM Masks with Class and Semantic Masks 3D Visualization for SAM Masks with Class and Semantic Masks

βš™οΈ Installation

Please see installation guide.

πŸ’« Try Demo

πŸ€— Try Demo on Huggingface

Hugging Face Spaces Hugging Face Spaces

πŸ€— Try Demo Locally

We provide the UI (ui.py) and example inputs (/UI/) to reproduce the above demos. We use the OVSeg checkpoints ovseg_swinbase_vitL14_ft_mpt.pth for zero-shot semantic segmentation, and SAM checkpoints sam_vit_h_4b8939.pth. Put them under this repo. Simply try our UI on your own computer:

python ui.py 

Simply click one of the Examples at the bottom and the input examples will be automatically fill in. Then simply click 'Send' to generate and visualize the results. The inference takes around 2 and 3 minutes for ScanNet and SAIL-VOS 3D respectively.

Data Preparation

Please download SAIL-VOS 3D and ScanNet to try more demos.

LICENSE

Shield: CC BY-NC 4.0

This repo is developed based on OVSeg which is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the MIT license; MaskFormer is licensed under the CC-BY-NC; openclip is licensed under the license at its repo; SAM is licensed under the Apache License.