ZiyuGuo99 / SAM2Point

The Most Faithful Implementation of Segment Anything (SAM) in 3D
https://sam2point.github.io/
Apache License 2.0
258 stars 12 forks source link

SAM2Point šŸ”„: Segment Any 3D as Videos

Official repository for the project "SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners".

[šŸŒ Webpage] [šŸ¤— HuggingFace Demo] [šŸ“– arXiv Report]

šŸ’„ News

šŸ‘€ About SAM2Point

We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for zero-shot and promptable 3D segmentation. Our framework supports various prompt types, including 3D points, boxes, and masks, and can generalize across diverse scenarios, such as 3D objects, indoor scenes, outdoor scenes, and raw LiDAR.


To our best knowledge, SAM2POINT presents the most faithful implementation of SAM in 3D, demonstrating superior implementation efficiency, promptable flexibility, and generalization capabilities for 3D segmentation.


šŸŽ¬ Multi-directional Videos from SAM2Point

We showcase the multi-directional videos generated during the segmentation of SAM2Point:

3D Object

3D Indoor Scene

3D Outdoor Scene

3D Raw LiDAR

šŸ’Ŗ Get Started

Installation

Clone the repository:

   git clone https://github.com/ZiyuGuo99/SAM2Point.git
   cd SAM2Point

Create a conda environment:

   conda create -n sam2point python=3.10
   conda activate sam2point

SAM2Point requires Python >= 3.10, PyTorch >= 2.3.1, and TorchVision >= 0.18.1. Please follow the instructions here to install both PyTorch and TorchVision dependencies.

Install additional dependencies:

   pip install -r requirements.txt

Prepare SAM 2 and 3D Data Samples

Download the checkpoint of SAM 2:

   cd checkpoints
   bash download_ckpts.sh
   cd ..

We provide 3D data samples from different datasets for testing SAM2Point:

   gdown --id 1hIyjBCd2lsLnP_GYw-AMkxJnvNtyxBYq
   unzip data.zip

Alternatively, you can download the samples directly from this link.

Code for custom 3D input and prompts will be released soon.

Start Segmentation

Modify DATASET, SAMPLE_IDX, PROPMT_TYPE, PROMPT_IDX in run.sh to specify the 3D input and prompt.

Run the segmentation script:

   bash run.sh

The segmentation results will be saved under ./results/, and the corresponding multi-directional videos will be saved under ./video/.

:white_check_mark: Citation

If you find SAM2Point useful for your research or applications, please kindly cite using this BibTeX:

@article{guo2024sam2point,
  title={SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners},
  author={Guo, Ziyu and Zhang, Renrui and Zhu, Xiangyang and Tong, Chengzhuo and Gao, Peng and Li, Chunyuan and Heng, Pheng-Ann},
  journal={arXiv preprint arXiv:2408.16768},
  year={2024}
}

šŸ§  Related Work

Explore our additional research on 3D, SAM, and Multi-modal Large Language Models: