Eaphan / UPIDet

Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]
Apache License 2.0
54 stars 7 forks source link
3d-object-detection cross-modal multi-modal

Unleash the Potential of Image Branch for Cross-modal 3D Object Detection

This is the official implementation of "Unleash the Potential of Image Branch for Cross-modal 3D Object Detection". This repository is based on [OpenPCDet].

Abstract: To achieve reliable and precise scene understanding, autonomous vehicles typically incorporate multiple sensing modalities to capitalize on their complementary attributes. However, existing cross-modal 3D detectors do not fully utilize the image domain information to address the bottleneck issues of the LiDAR-based detectors. This paper presents a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects. First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation. This approach enables the learning of local spatial-aware features from the image modality to supplement sparse point clouds. Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch, utilizing a succinct and effective point-to-pixel module. Extensive experiments and ablation studies validate the effectiveness of our method. Notably, we achieved the top rank in the highly competitive cyclist class of the KITTI benchmark at the time of submission.

Overview

Installation

Please refer to INSTALL.md for the installation instruction.

Pretrained-models

Here we present the 3D detection performance of moderate difficulty on the val set of KITTI dataset.

training time Car@R40 Pedestrian@R40 Cyclist@R40 download
UPIDet ~12 hours 86.10 68.67 76.70 model-287M

Getting Started

Prepare KITTI Dataset

OpenPCDet
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & planes
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2
├── pcdet
├── tools

Prepare Waymo Open Dataset

* You should use mmdet3d to generate RGB images for waymo dataset. Then you can link the image files to  the kitti_format directory using modified script tools/map_mmdet_waymo_image.py.

* Install the official `waymo-open-dataset` by running the following command: 
```shell script
pip3 install --upgrade pip
# tf 2.0.0
pip3 install waymo-open-dataset-tf-2-5-0 --user

Training

cd tools;
python train.py --cfg_file ./cfgs/kitti_models/upidet.yaml

Multi gpu training, assuming you have 4 gpus:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/dist_train.sh 4 --cfg_file ./cfgs/kitti_models/upidet.yaml

Note: For the waymo dataset, you should checkout branch "waymo_lidar" to train the single-modal detector, then checkout branch "waymo" to train the cross-modal detector based on the weights of obtained single-modal detector.

Testing

cd tools/

Single gpu testing for all saved checkpoints, assuming you have 4 gpus:

python test.py --eval_all --cfg_file ./cfgs/kitti_models/upidet.yaml

Multi gpu testing for all saved checkpoints, assuming you have 4 gpus:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/dist_test.sh 4 --eval_all --cfg_file ./cfgs/kitti_models/upidet.yaml

Multi gpu testing a specific checkpoint, assuming you have 4 gpus and checkpoint_39 is your best checkpoint :

CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/dist_test.sh 4  --cfg_file ./cfgs/kitti_models/upidet.yaml --ckpt ../output/upidet/default/ckpt/checkpoint_epoch_80.pth

License

UPIDet is released under the Apache 2.0 license.

Acknowledgement

We sincerely appreciate the following open-source projects for providing valuable and high-quality codes:

Citation

If you find this work useful in your research, please consider cite:

@inproceedings{zhang2024unleash,
    title={Unleash the potential of image branch for cross-modal 3d object detection},
    author={Zhang, Yifan and Zhang, Qijian and Hou, Junhui and Yuan, Yixuan and Xing, Guoliang},
    booktitle={Advances in Neural Information Processing Systems},
    volume={36},
    pages={51562--51583},
    year={2023}
}