Jieqianyu / SGN

Implementation of IEEE TIP 2024 paper - "Camera-based Semantic Scene Completion with Sparse Guidance Network"
Other
24 stars 3 forks source link
# Camera-based Semantic Scene Completion with Sparse Guidance Network

Camera-based Semantic Scene Completion with Sparse Guidance Network.

[Arxiv]

News

Abstract

Semantic scene completion (SSC) aims to predict the semantic occupancy of each voxel in the entire 3D scene from limited observations, which is an emerging and critical task for autonomous driving. Recently, many studies have turned to camera-based SSC solutions due to the richer visual cues and cost-effectiveness of cameras. However, existing methods usually rely on sophisticated and heavy 3D models to process the lifted 3D features directly, which are not discriminative enough for clear segmentation boundaries. In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues. Firstly, to exploit depth-aware context and dynamically select sparse seed voxels, we redesign the sparse voxel proposal network to process points generated by depth prediction directly with the coarse-to-fine paradigm. Furthermore, by designing hybrid guidance (sparse semantic and geometry guidance) and effective voxel aggregation for spatial geometry cues, we enhance the feature separation between different categories and expedite the convergence of semantic propagation. Finally, we devise the multi-scale semantic propagation module for flexible receptive fields while reducing the computation resources. Extensive experimental results on the SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate the superiority of our SGN over existing state-of-the-art methods. And even our lightweight version SGN-L achieves notable scores of 14.80\% mIoU and 45.45\% IoU on SeamnticKITTI validation with only 12.5 M parameters and 7.16 G training memory.

Method

SGN.jpg

Getting Started

Installation

Please refer to Voxformer to create base environment. Some extra packages are needed to be installed:

Train SGN with 4 GPUs

./tools/dist_train.sh ./projects/configs/sgn/sgn-T-one-stage-guidance.py 4

Eval SGN with 4 GPUs

./tools/dist_test.sh ./projects/configs/sgn/sgn-T-one-stage-guidance.py ./path/to/ckpts.pth 4

Model Zoo

Backbone Dataset Method IoU mIoU Params (M) Config Download
R50 Sem.KITTI val/test SGN-T 46.21/45.42 15.32/15.76 28.2 config model
R50 KITTI360 val/test SGN-T 47.50/47.06 19.07/18.25 28.2 config model
R18 Sem.KITTI val/test SGN-L 45.45/43.71 14.80/14.39 12.5 config model
R18 KITTI360 val/test SGN-L 46.67/46.64 17.11/16.95 12.5 config model
R50 Sem.KITTI val/test SGN-S 43.60/41.88 14.55/14.01 28.2 config model
R50 KITTI360 val/test SGN-S 46.13/46.22 18.29/17.71 28.2 config model

Note that we used the checkpoints that performed best on the validation set during training to evaluate SGN on the test sets for both SemanticKITTI and SSCBench-KITTI-360 datasets.

TODO

Acknowledgement

Many thanks to these excellent open source projects:

Ciatation

If you find this project helpful, please consider citing the following paper:

@article{mei2024camera,
  title={Camera-based 3d semantic scene completion with sparse guidance network},
  author={Mei, Jianbiao and Yang, Yu and Wang, Mengmeng and Zhu, Junyu and Ra, Jongwon and Ma, Yukai and Li, Laijian and Liu, Yong},
  journal={IEEE Transactions on Image Processing},
  year={2024},
  publisher={IEEE}
}