MSMDFusion

Official implementation of our CVPR'2023 paper "MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection", by Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, and Yu-Gang Jiang. MSMDFusion framework

Introduction

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and semantic features from two drastically different modalities. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques. However, depth information is under-investigated in these approaches when lifting points into 3D space, thus 2D semantics can not be reliably fused with 3D points. Moreover, their multi-modal fusion strategy, which is implemented as concatenation or attention, either can not effectively fuse 2D and 3D information or is unable to perform fine-grained interactions in the voxel space. To this end, we propose a novel framework called MSMDFusion to tackle above problems.

Getting Started

Installation

For basic installation, please refer to getting_started.md for installation.

Notice:

spconv-2.x is required for its sparse_add op.
You should manually add mmcv register to spconv library file following this example

Data Preparation

Step 1: Please refer to the official site for prepare nuscenes data. After data preparation, you will be able to see the following directory structure:

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval
│   │   ├── nuscenes_database
│   │   ├── nuscenes_infos_train.pkl
│   │   ├── nuscenes_infos_val.pkl
│   │   ├── nuscenes_infos_test.pkl
│   │   ├── nuscenes_dbinfos_train.pkl

Step 2: Download preprocessed virtual points samples(extraction code: 9xcb) and sweeps(extraction code: 2eg1) data. And put them under the above folder samples and sweeps, respectively, and rename them as FOREGROUND_MIXED_6NN_WITH_DEPTH.

Training and Evaluation

For training, you need to first train a pure LiDAR backbone, such as TransFusion-L. Then, you can merge the checkpoints from pretrained TransFusion-L and ResNet-50 as suggested here. We also provide a merged 1-st stage checkpoint here(extraction code: 69i7)

# 1-st stage training
sh ./tools/dist_train.sh ./configs/transfusion_nusc_voxel_L.py 8
# 2-nd stage training
sh ./tools/dist_train.sh ./configs/MSMDFusion_nusc_voxel_LC.py 8

Notice: When training the 1-st stage of TransFusion-L, please follow the copy-and-paste fade strategy as suggested here.

For evaluation, you can use the following command:

# Evaluation
sh ./tools/dist_test.sh ./configs/MSMDFusion_nusc_voxel_LC.py $ckpt_path$ 8 --eval bbox

For testing and making a submission to the leaderboard, please refer to the official site

Results

3D Object Detection on nuScenes	Model	Set	mAP	NDS
MSMDFusion	val	69.27	72.05	checkpoints
MSMDFusion	test	71.49	73.96	predictions
MSMDFusion-TTA	test	73.28	75.09	predictions

3D Object Tracking on nuScenes	Model	Set	AMOTA	AMOTP	Recall	Result Files
MSMDFusion	test	73.98	54.87	76.30	predictions

Citation

If you find our paper useful, please cite:

@InProceedings{Jiao_2023_CVPR,
    author    = {Jiao, Yang and Jie, Zequn and Chen, Shaoxiang and Chen, Jingjing and Ma, Lin and Jiang, Yu-Gang},
    title     = {MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {21643-21652}
}

Acknowlegement

We sincerely thank the authors of mmdetection3d, CenterPoint, TransFusion, MVP, BEVFusion and BEVFusion for open sourcing their methods.

SxJyJay / MSMDFusion

readme