Zhening Huang · Xiaoyang Wu · Xi Chen · Hengshuang Zhao · Lei Zhu · Joan Lasenby
TL;DR: OpenIns3D proposes a "mask-snap-lookup" scheme to achieve 2D-input-free 3D open-world scene understanding, which attains SOTA performance across datasets, even with fewer input prerequisites. 🚀✨
device to watch BBC news | furniture that is capable of producing music | Ma Long's domain of excellence |
most comfortable area to sit in the room | penciling down ideas during brainstorming | furniture offers recreational enjoyment with friends |
Please check the installation file to install OpenIns3D for:
🔧 Data Preparation:
.ply
files, predicted masks, and ground truth:
sh scripts/prepare_replica.sh
sh scripts/prepare_yoloworld.sh
📊 Open Vocabulary Instance Segmentation:
python openins3d/main.py --dataset replica --task OVIS --detector yoloworld
📈 Results Log: | Task | AP | AP50 | AP25 | Log |
---|---|---|---|---|---|
Replica OVIS (in paper) | 13.6 | 18.0 | 19.7 | ||
Replica OVIS (this Code) | 15.4 | 19.5 | 25.2 | log |
🔧 Data Preparation:
download-scannet.py
script into the scripts
directory._vh_clean_2.ply
files for validation sets, as well as instance ground truth, GT-masks, and detected masks:sh scripts/prepare_scannet.sh
📊 Open Vocabulary Object Recognition:
python openins3d/main.py --dataset scannet --task OVOR --detector odise
📈 Results Log: | Task | Top-1 Accuracy | Log |
---|---|---|---|
ScanNet_OVOR (in paper) | 60.4 | ||
ScanNet_OVOR (this Code) | 64.2 | log |
📊 Open Vocabulary Object Detection:
python openins3d/main.py --dataset scannet --task OVOD --detector odise
📊 Open Vocabulary Instance Segmentation:
python openins3d/main.py --dataset scannet --task OVIS --detector odise
📈 Results Log: | Task | AP | AP50 | AP25 | Log |
---|---|---|---|---|---|
ScanNet_OVOD (in paper) | 17.8 | 28.3 | 36.0 | ||
ScanNet_OVOD (this Code) | 20.7 | 29.9 | 39.7 | log | |
ScanNet_OVIS (in paper) | 19.9 | 28.7 | 38.9 | ||
ScanNet_OVIS (this Code) | 23.3 | 34.6 | 42.6 | log |
🔧 Data Preparation:
.ply
files, predicted masks, and ground truth:
sh scripts/prepare_s3dis.sh
📊 Open Vocabulary Instance Segmentation:
python openins3d/main.py --dataset s3dis --task OVIS --detector odise
📈 Results Log: | Task | AP | AP50 | AP25 | Log |
---|---|---|---|---|---|
S3DIS OVIS (in paper) | 21.1 | 28.3 | 29.5 | ||
S3DIS OVIS (this Code) | 22.9 | 29.0 | 31.4 | log |
🔧 Data Preparation:
.ply
files, predicted masks, and ground truth:
sh scripts/prepare_stpls3d.sh
📊 Open Vocabulary Instance Segmentation:
python openins3d/main.py --dataset stpls3d --task OVIS --detector odise
📈 Results Log: | Task | AP | AP50 | AP25 | Log |
---|---|---|---|---|---|
STPLS3D OVIS (in paper) | 11.4 | 14.2 | 17.2 | ||
STPLS3D OVIS (this Code) | 15.3 | 17.3 | 17.4 | log |
We also evaluate the performance of OpenIns3D when the Snap module is replaced with original RGBD images while keeping the other design intact.
🔧 Data Preparation
sh scripts/prepare_replica.sh
sh scripts/prepare_replica2d.sh
sh scripts/prepare_yoloworld.sh
📊 Open Vocabulary Instance Segmentation
python openins3d/main.py --dataset replica --task OVIS --detector yoloworld --use_2d true
📈 Results Log | Task | AP | AP50 | AP25 | Log |
---|---|---|---|---|---|
OpenMask3D | 13.1 | 18.4 | 24.2 | ||
Open3DIS | 18.5 | 24.5 | 28.2 | ||
OpenIns3D | 21.1 | 26.2 | 30.6 | log |
We demonstrate how to perform single-vocabulary instance segmentation similar to the teaser image in the paper. The key new feature is the introduction of a CLIP ranking and filtering module to reduce false-positive results. (Works best with RGBD but is also fine with SNAP.)
Quick Start:
📥 Download the demo dataset by running:
sh scripts/prepare_demo_single.sh
🚀 Run the model by executing:
python zero_shot_single_voc.py
You can now view results like teaser images in 2D or 3D.
ℹ️ Note: Ensure you have installed the mask module according to the installation guide, as it is not required for reproducing results.
To perform zero-shot scene understanding:
📥 Download the scannet200_val.ckpt
checkpoint from this link and place it in the third_party/
directory.
🚀 Run the model by executing python zero_shot.py
and specify:
pcd_path
: The path to the colored point cloud file.vocab
: A list of vocabulary terms to search for.You can also use the following script to automatically set up the scannet200_val.ckpt
checkpoint and download some sample 3D scans:
sh scripts/prepare_zero_shot.sh
To perform zero-shot inference using the sample dataset (default with Replica vocabulary), run:
python zero_shot_multi_vocs.py --pcd_path data/demo_scenes/demo_scene_1.ply
📂 Results are saved under output/snap_demo/demo_scene_1_vis/image
.
To use a different 2D detector (🔍 ODISE works better on pcd-rendered images):
python zero_shot_multi_vocs.py --pcd_path data/demo_scenes/demo_scene_2.ply --detector yoloworld
📝 Custom Vocabulary: If you want to specify your own vocabulary list, add it with the --vocab
flag as follows:
python zero_shot_multi_vocs.py \
--pcd_path 'data/demo_scenes/demo_scene_4.ply' \
--vocab "drawers" "lower table"
If you find OpenIns3D and this codebase useful for your research, please cite our work as a form of encouragement. 😊
@article{huang2024openins3d,
title={OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation},
author={Zhening Huang and Xiaoyang Wu and Xi Chen and Hengshuang Zhao and Lei Zhu and Joan Lasenby},
journal={European Conference on Computer Vision},
year={2024}
}
The mask proposal model is modified from Mask3D, and we heavily used the easy setup version of it for MPM. Thanks again for the great work! 🙌 We also drew inspiration from LAR and ContrastiveSceneContexts when developing the code. 🚀