dk-liang / UniSeg3D

A Unified Framework for 3D Scene Understanding
Apache License 2.0
68 stars 3 forks source link

A Unified Framework for 3D Scene Understanding

Wei Xu*, Chunsheng Shi*, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai

Huazhong University of Science and Technology
(*) equal contribution.

An officical implementation of "A Unified Framework for 3D Scene Understanding".

[Project Page] [Arxiv Paper]

News

Abstract

We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing and, therefore, promotes comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance the performance by leveraging task connections. Specifically, we design a knowledge distillation method and a contrastive learning method to transfer task-specific knowledge across different tasks. Benefiting from extensive inter-task knowledge sharing, our UniSeg3D becomes more powerful. Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods, even those specialized for individual tasks. We hope UniSeg3D can serve as a solid unified baseline and inspire future work.

Overview

Introduction

TODO

Installation

Dataset

The directory structure after pre-processing should be as below:

UniSeg3D
├── data
│   ├── scannet
│   │   ├── meta_data
│   │   ├── batch_load_scannet_data.py
│   │   ├── load_scannet_data.py
│   │   ├── scannet_utils.py
│   │   ├── scans
│   │   ├── scans_test
│   │   ├── scannet_instance_data
│   │   ├── points
│   │   │   ├── xxxxx.bin
│   │   ├── instance_mask
│   │   │   ├── xxxxx.bin
│   │   ├── semantic_mask
│   │   │   ├── xxxxx.bin
│   │   ├── super_points
│   │   │   ├── xxxxx.bin
│   │   ├─ uniseg3d_infos_val.pkl
│   │   ├─ scannet_cls_embedding.pth

Inference

Models

Dataset ScanNet20 ScanRefer ScanNet200 Download
Task PS SS IS Interactive Referring OVS ckpt
Metric PQ mIoU mAP AP mIoU AP -
UniSeg3D 71.3 76.3 59.1 54.1 29.5 19.6 link
UniSeg3D* 71.3 76.9 59.3 54.5 29.6 19.7 link

Experiment

Experiment

Visulizations on six 3D segmentation tasks

Visulation

🙏 Acknowledgement

We are thankful to SPFormer, OneFormer3D, Open3DIS and OMG-Seg for releasing their models and code as open-source contributions.

Citation

@article{xu2024unified,
  title={A Unified Framework for 3D Scene Understanding},
  author={Xu, Wei and Shi, Chunsheng and Tu, Sifan and Zhou, Xin and Liang, Dingkang and Bai, Xiang},
  journal={arXiv preprint arXiv:2407.03263},
  year={2024}
}