SparseBEV

This is the official PyTorch implementation for our ICCV 2023 paper:

SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
Haisong Liu, Yao Teng, Tao Lu, Haiguang Wang, Limin Wang
Nanjing University, Shanghai AI Lab

中文解读：https://zhuanlan.zhihu.com/p/654821380

News

2024-03-31: The code of SparseOcc is released at https://github.com/MCG-NJU/SparseOcc.
2023-12-29: Check out our new paper (https://arxiv.org/abs/2312.17118) to learn about SparseOcc, a fully sparse architecture for panoptic occupancy!
2023-10-20: We provide code for visualizing the predictions and the sampling points, as requested in #25.
2023-09-23: We release the native PyTorch implementation of sparse sampling. You can use this version if you encounter problems when compiling CUDA operators. It’s only about 15% slower.
2023-08-21: We release the paper, code and pretrained weights.
2023-07-14: SparseBEV is accepted to ICCV 2023.
2023-02-09: SparseBEV-Beta achieves 65.6 NDS on the nuScenes leaderboard.

Model Zoo

Setting	Pretrain	Training Cost	NDS_val	NDS_test	FPS	Weights
r50_nuimg_704x256	nuImg	21h (8x2080Ti)	55.6	-	15.8	gdrive
r50_nuimg_704x256_400q_36ep	nuImg	28h (8x2080Ti)	55.8	-	23.5	gdrive
r101_nuimg_1408x512	nuImg	2d8h (8xV100)	59.2	-	6.5	gdrive
vov99_dd3d_1600x640_trainval_future	DD3D	4d1h (8xA100)	84.9	67.5	-	gdrive
vit_eva02_1600x640_trainval_future	EVA02	11d (8xA100)	85.3	70.2	-	gdrive

We use r50_nuimg_704x256 for ablation studies and r50_nuimg_704x256_400q_36ep for comparison with others.
We recommend using r50_nuimg_704x256 to validate new ideas since it trains faster and the result is more stable.
FPS is measured with AMD 5800X CPU and RTX 3090 GPU (without fp16).
The noise is around 0.3 NDS.

Environment

Install PyTorch 2.0 + CUDA 11.8:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia

or PyTorch 1.10.2 + CUDA 10.2 for older GPUs:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch

Install other dependencies:

pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5

Install turbojpeg and pillow-simd to speed up data loading (optional but important):

sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1

Compile CUDA extensions:

cd models/csrc
python setup.py build_ext --inplace

Prepare Dataset

Download nuScenes from https://www.nuscenes.org/nuscenes and put it in data/nuscenes.
Download the generated info file from Google Drive and unzip it.
Folder structure:

data/nuscenes
├── maps
├── nuscenes_infos_test_sweep.pkl
├── nuscenes_infos_train_sweep.pkl
├── nuscenes_infos_train_mini_sweep.pkl
├── nuscenes_infos_val_sweep.pkl
├── nuscenes_infos_val_mini_sweep.pkl
├── samples
├── sweeps
├── v1.0-test
└── v1.0-trainval

These *.pkl files can also be generated with our script: gen_sweep_info.py.

Training

Download pretrained weights and put it in directory pretrain/:

pretrain
├── cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth
├── cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth

Train SparseBEV with 8 GPUs:

torchrun --nproc_per_node 8 train.py --config configs/r50_nuimg_704x256.py

Train SparseBEV with 4 GPUs (i.e the last four GPUs):

export CUDA_VISIBLE_DEVICES=4,5,6,7
torchrun --nproc_per_node 4 train.py --config configs/r50_nuimg_704x256.py

The batch size for each GPU will be scaled automatically. So there is no need to modify the batch_size in config files.

Evaluation

Single-GPU evaluation:

export CUDA_VISIBLE_DEVICES=0
python val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Multi-GPU evaluation:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node 8 val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Timing

FPS is measured with a single GPU:

export CUDA_VISIBLE_DEVICES=0
python timing.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Visualization

Visualize the predicted bbox:

python viz_bbox_predictions.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Visualize the sampling points (like Fig. 6 in the paper):

python viz_sample_points.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Acknowledgements

Many thanks to these excellent open-source projects:

3D Detection: DETR3D, PETR, BEVFormer, BEVDet, StreamPETR
2D Detection: AdaMixer, DN-DETR
Codebase: MMDetection3D, CamLiFlow

MCG-NJU / SparseBEV

readme