WayneMao / PillarNeSt

The Official Implementation of PillarNeSt
Apache License 2.0
37 stars 1 forks source link
3ddetection autonomous-vehicles point-cloud

PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection

TiV

arch_pillarnest

PillarNeSt is a robust pillar-based 3D object detectors, which obtains 66.9%(SOTA without TTA/model ensemble) mAP and 71.6 % NDS on nuScenes benchmark.

Visualization Results Visualization Results

News

Our paper has been officially accepted by the journal IEEE Transactions on Intelligent Vehicles (TIV) in April 2024.

Preparation

Model weights are available at Google Drive and BaiduWangpan(PW: 1111).

Main Results

Results on nuScenes val set. (15e + 5e means the last 5 epochs should be trained without GTsample)

Config mAP NDS Schedule weights weights
PillarNeSt-Tiny 58.8% 65.6% 15e+5e Google Drive Baidu
PillarNeSt-Small 61.7% 68.1% 15e+5e Google Drive Baidu
PillarNeSt-Base 63.2% 69.2% 15e+5e Google Drive Baidu
PillarNeSt-Large 64.3% 70.4% 18e+2e Google Drive Baidu

Results on nuScenes test set (without any TTA/model ensemble).

Config mAP NDS
PillarNeSt-Base 65.6 % 71.3%
PillarNeSt-Large 66.9% 71.6%

Update:

TODO:

Contact

If you have any questions, feel free to open an issue or contact us at maoweixin@megvii.com (maowx2017@fuji.waseda.jp) or wangtiancai@megvii.com.

Citation

If you find PillarNeSt helpful in your research, please consider citing:

@ARTICLE{10495196,
  author={Mao, Weixin and Wang, Tiancai and Zhang, Diankun and Yan, Junjie and Yoshie, Osamu},
  journal={IEEE Transactions on Intelligent Vehicles}, 
  title={PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection}, 
  year={2024},
  volume={},
  number={},
  pages={1-10},
  keywords={Three-dimensional displays;Point cloud compression;Feature extraction;Detectors;Object detection;Task analysis;Convolution;Point Cloud;3D Object Detection;Backbone Scaling;Pretraining;Autonomous Driving},
  doi={10.1109/TIV.2024.3386576}}

PS:

Recently, our team also conduct some explorations into the application of multi-modal large language model (MLLM) in the field of autonomous driving:

Adriver-I: A general world model for autonomous driving

arch_adrive-I

@article{jia2023adriver,
  title={Adriver-i: A general world model for autonomous driving},
  author={Jia, Fan and Mao, Weixin and Liu, Yingfei and Zhao, Yucheng and Wen, Yuqing and Zhang, Chi and Zhang, Xiangyu and Wang, Tiancai},
  journal={arXiv preprint arXiv:2311.13549},
  year={2023}
}

PPS:

组内招收具身智能相关的实习生,详情咨询/简历投递:maoweixin@megvii.com