PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection

arch_pillarnest

PillarNeSt is a robust pillar-based 3D object detectors, which obtains 66.9%(SOTA without TTA/model ensemble) mAP and 71.6 % NDS on nuScenes benchmark.

Visualization Results

News

Our paper has been officially accepted by the journal IEEE Transactions on Intelligent Vehicles (TIV) in April 2024.

Preparation

Environments

Python == 3.6
CUDA == 11.1
pytorch == 1.9.0
mmcls == 0.22.1
mmcv-full == 1.4.2
mmdet == 2.20.0
mmsegmentation == 0.20.2
mmdet3d == 0.18.1

Data
Follow the mmdet3d to process the nuScenes dataset.
Weights

Model weights are available at Google Drive and BaiduWangpan(PW: 1111).

Main Results

Results on nuScenes val set. (15e + 5e means the last 5 epochs should be trained without GTsample)

Config	mAP	NDS	Schedule	weights	weights
PillarNeSt-Tiny	58.8%	65.6%	15e+5e	Google Drive	Baidu
PillarNeSt-Small	61.7%	68.1%	15e+5e	Google Drive	Baidu
PillarNeSt-Base	63.2%	69.2%	15e+5e	Google Drive	Baidu
PillarNeSt-Large	64.3%	70.4%	18e+2e	Google Drive	Baidu

Results on nuScenes test set (without any TTA/model ensemble).

Config	mAP	NDS
PillarNeSt-Base	65.6 %	71.3%
PillarNeSt-Large	66.9%	71.6%

Update:

[x] add visualization
[x] add CenterPlusHead
[x] add HeightPillarFeatureNet
[x] add CenterPoint-Plus
[x] Small, Base, Large configs
[x] Upload weights to Baidu cloud

TODO:

[ ] weights on test set
[ ] Backbone code
[ ] add AV2 dataloader

Contact

If you have any questions, feel free to open an issue or contact us at maoweixin@megvii.com (maowx2017@fuji.waseda.jp) or wangtiancai@megvii.com.

Citation

If you find PillarNeSt helpful in your research, please consider citing:

@ARTICLE{10495196,
  author={Mao, Weixin and Wang, Tiancai and Zhang, Diankun and Yan, Junjie and Yoshie, Osamu},
  journal={IEEE Transactions on Intelligent Vehicles}, 
  title={PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection}, 
  year={2024},
  volume={},
  number={},
  pages={1-10},
  keywords={Three-dimensional displays;Point cloud compression;Feature extraction;Detectors;Object detection;Task analysis;Convolution;Point Cloud;3D Object Detection;Backbone Scaling;Pretraining;Autonomous Driving},
  doi={10.1109/TIV.2024.3386576}}

PS:

Recently, our team also conduct some explorations into the application of multi-modal large language model (MLLM) in the field of autonomous driving:

Adriver-I: A general world model for autonomous driving

arch_adrive-I

@article{jia2023adriver,
  title={Adriver-i: A general world model for autonomous driving},
  author={Jia, Fan and Mao, Weixin and Liu, Yingfei and Zhao, Yucheng and Wen, Yuqing and Zhang, Chi and Zhang, Xiangyu and Wang, Tiancai},
  journal={arXiv preprint arXiv:2311.13549},
  year={2023}
}

PPS:

组内招收具身智能相关的实习生，详情咨询/简历投递：maoweixin@megvii.com

WayneMao / PillarNeSt

readme