English | 简体中文
Shaoyuan Xie1
Lingdong Kong2,3
Wenwei Zhang2,4
Jiawei Ren4
Liang Pan2
Kai Chen2
Ziwei Liu4
1University of California, Irvine
2Shanghai AI Laboratory
3National University of Singapore
4S-Lab, Nanyang Technological University
RoboBEV
is the first robustness evaluation benchmark tailored for camera-based bird's eye view (BEV) perception under natural data corruption and domain shift, which are cases that have a high likelihood to occur in real-world deployments.
[Common Corruption] - We investigate eight data corruption types that are likely to appear in driving scenarios, ranging from 1sensor failure
, 2motion & data processing
, 3lighting conditions
, and 4weather conditions
.
[Domain Shift] - We benchmark the adaptation performance of BEV models from three aspects, including 1city-to-city
, 2day-to-night
, and 3dry-to-rain
.
FRONT_LEFT | FRONT | FRONT_RIGHT | FRONT_LEFT | FRONT | FRONT_RIGHT |
BACK_LEFT | BACK | BACK_RIGHT | BACK_LEFT | BACK | BACK_RIGHT |
Visit our project page to explore more examples. :blue_car:
DeepVision
, :2nd_place_medal: Ponyville Autonauts Ltd
, :3rd_place_medal: CyberBEV
SafeDrive-SSR
, :2nd_place_medal: CrazyFriday
, :3rd_place_medal: Samsung Research
ViewFormer
, :2nd_place_medal: APEC Blue
, :3rd_place_medal: hm.unilab
HIT-AIIA
, :2nd_place_medal: BUAA-Trans
, :3rd_place_medal: CUSTZS
safedrive-promax
, :2nd_place_medal: Ponyville Autonauts Ltd
, :3rd_place_medal: HITSZrobodrive
RoboBEV
! In this initial version, 11 BEV detection algorithms and 1 monocular 3D detection algorithm have been benchmarked under 8 corruption types across 3 severity levels.Kindly refer to INSTALL.md for the installation details.
Our datasets are hosted by OpenDataLab.
OpenDataLab is a pioneering open data platform for the large AI model era, making datasets accessible. By using OpenDataLab, researchers can obtain free formatted datasets in various fields.
Kindly refer to DATA_PREPARE.md for the details to prepare the nuScenes
and nuScenes-C
datasets.
Kindly refer to GET_STARTED.md to learn more usage about this codebase.
:triangular_ruler: Metrics: The nuScenes Detection Score (NDS) is consistently used as the main indicator for evaluating model performance in our benchmark. The following two metrics are adopted to compare between models' robustness:
:gear: Notation: Symbol :star: denotes the baseline model adopted in mCE calculation. For more detailed experimental results, please refer to RESULTS.md.
Model | mCE (%) $\downarrow$ | mRR (%) $\uparrow$ | Clean | Cam Crash | Frame Lost | Color Quant | Motion Blur | Bright | Low Light | Fog | Snow |
---|---|---|---|---|---|---|---|---|---|---|---|
DETR3D:star: | 100.00 | 70.77 | 0.4224 | 0.2859 | 0.2604 | 0.3177 | 0.2661 | 0.4002 | 0.2786 | 0.3912 | 0.1913 |
DETR3DCBGS | 99.21 | 70.02 | 0.4341 | 0.2991 | 0.2685 | 0.3235 | 0.2542 | 0.4154 | 0.2766 | 0.4020 | 0.1925 |
BEVFormerSmall | 101.23 | 59.07 | 0.4787 | 0.2771 | 0.2459 | 0.3275 | 0.2570 | 0.3741 | 0.2413 | 0.3583 | 0.1809 |
BEVFormerBase | 97.97 | 60.40 | 0.5174 | 0.3154 | 0.3017 | 0.3509 | 0.2695 | 0.4184 | 0.2515 | 0.4069 | 0.1857 |
PETRR50-p4 | 111.01 | 61.26 | 0.3665 | 0.2320 | 0.2166 | 0.2472 | 0.2299 | 0.2841 | 0.1571 | 0.2876 | 0.1417 |
PETRVoV-p4 | 100.69 | 65.03 | 0.4550 | 0.2924 | 0.2792 | 0.2968 | 0.2490 | 0.3858 | 0.2305 | 0.3703 | 0.2632 |
ORA3D | 99.17 | 68.63 | 0.4436 | 0.3055 | 0.2750 | 0.3360 | 0.2647 | 0.4075 | 0.2613 | 0.3959 | 0.1898 |
BEVDetR50 | 115.12 | 51.83 | 0.3770 | 0.2486 | 0.1924 | 0.2408 | 0.2061 | 0.2565 | 0.1102 | 0.2461 | 0.0625 |
BEVDetR101 | 113.68 | 53.12 | 0.3877 | 0.2622 | 0.2065 | 0.2546 | 0.2265 | 0.2554 | 0.1118 | 0.2495 | 0.0810 |
BEVDetR101-pt | 112.80 | 56.35 | 0.3780 | 0.2442 | 0.1962 | 0.3041 | 0.2590 | 0.2599 | 0.1398 | 0.2073 | 0.0939 |
BEVDetSwinT | 116.48 | 46.26 | 0.4037 | 0.2609 | 0.2115 | 0.2278 | 0.2128 | 0.2191 | 0.0490 | 0.2450 | 0.0680 |
BEVDepthR50 | 110.02 | 56.82 | 0.4058 | 0.2638 | 0.2141 | 0.2751 | 0.2513 | 0.2879 | 0.1757 | 0.2903 | 0.0863 |
BEVerseSwinT | 110.67 | 48.60 | 0.4665 | 0.3181 | 0.3037 | 0.2600 | 0.2647 | 0.2656 | 0.0593 | 0.2781 | 0.0644 |
BEVerseSwinS | 117.82 | 49.57 | 0.4951 | 0.3364 | 0.2485 | 0.2807 | 0.2632 | 0.3394 | 0.1118 | 0.2849 | 0.0985 |
PolarFormerR101 | 96.06 | 70.88 | 0.4602 | 0.3133 | 0.2808 | 0.3509 | 0.3221 | 0.4304 | 0.2554 | 0.4262 | 0.2304 |
PolarFormerVoV | 98.75 | 67.51 | 0.4558 | 0.3135 | 0.2811 | 0.3076 | 0.2344 | 0.4280 | 0.2441 | 0.4061 | 0.2468 |
SRCN3DR101 | 99.67 | 70.23 | 0.4286 | 0.2947 | 0.2681 | 0.3318 | 0.2609 | 0.4074 | 0.2590 | 0.3940 | 0.1920 |
SRCN3DVoV | 102.04 | 67.95 | 0.4205 | 0.2875 | 0.2579 | 0.2827 | 0.2143 | 0.3886 | 0.2274 | 0.3774 | 0.2499 |
Sparse4DR101 | 100.01 | 55.04 | 0.5438 | 0.2873 | 0.2611 | 0.3310 | 0.2514 | 0.3984 | 0.2510 | 0.3884 | 0.2259 |
SOLOFusionshort | 108.68 | 61.45 | 0.3907 | 0.2541 | 0.2195 | 0.2804 | 0.2603 | 0.2966 | 0.2033 | 0.2998 | 0.1066 |
SOLOFusionlong | 97.99 | 64.42 | 0.4850 | 0.3159 | 0.2490 | 0.3598 | 0.3460 | 0.4002 | 0.2814 | 0.3991 | 0.1480 |
SOLOFusionfusion | 92.86 | 64.53 | 0.5381 | 0.3806 | 0.3464 | 0.4058 | 0.3642 | 0.4329 | 0.2626 | 0.4480 | 0.1376 |
FCOS3Dfinetune | 107.82 | 62.09 | 0.3949 | 0.2849 | 0.2479 | 0.2574 | 0.2570 | 0.3218 | 0.1468 | 0.3321 | 0.1136 |
BEVFusionCam | 109.02 | 57.81 | 0.4121 | 0.2777 | 0.2255 | 0.2763 | 0.2788 | 0.2902 | 0.1076 | 0.3041 | 0.1461 |
BEVFusionLiDAR | - | - | 0.6928 | - | - | - | - | - | - | - | - |
BEVFusionC+L | 43.80 | 97.41 | 0.7138 | 0.6963 | 0.6931 | 0.7044 | 0.6977 | 0.7018 | 0.6787 | - | - |
TransFusion | - | - | 0.6887 | 0.6843 | 0.6447 | 0.6819 | 0.6749 | 0.6843 | 0.6663 | - | - |
AutoAlignV2 | - | - | 0.6139 | 0.5849 | 0.5832 | 0.6006 | 0.5901 | 0.6076 | 0.5770 | - | - |
Model | Metric | Clean | Cam Crash | Frame Lost | Color Quant | Motion Blur | Bright | Low Light | Fog | Snow |
---|---|---|---|---|---|---|---|---|---|---|
SurroundDepth | Abs Rel | 0.280 | 0.485 | 0.497 | 0.334 | 0.338 | 0.339 | 0.354 | 0.320 | 0.423 |
Model | Metric | Clean | Cam Crash | Frame Lost | Color Quant | Motion Blur | Bright | Low Light | Fog | Snow |
---|---|---|---|---|---|---|---|---|---|---|
TPVFormer | mIoU vox | 52.06 | 27.39 | 22.85 | 38.16 | 38.64 | 49.00 | 37.38 | 46.69 | 19.39 |
SurroundOcc | SC mIoU | 20.30 | 11.60 | 10.00 | 14.03 | 12.41 | 19.18 | 12.15 | 18.42 | 7.39 |
Model | Pretrain | Temporal | Depth | CBGS | Backbone | EncoderBEV | Input Size | mCE (%) | mRR (%) | NDS |
---|---|---|---|---|---|---|---|---|---|---|
DETR3D | ✓ | ✗ | ✗ | ✗ | ResNet | Attention | 1600×900 | 100.00 | 70.77 | 0.4224 |
DETR3DCBGS | ✓ | ✗ | ✗ | ✓ | ResNet | Attention | 1600×900 | 99.21 | 70.02 | 0.4341 |
BEVFormerSmall | ✓ | ✓ | ✗ | ✗ | ResNet | Attention | 1280×720 | 101.23 | 59.07 | 0.4787 |
BEVFormerBase | ✓ | ✓ | ✗ | ✗ | ResNet | Attention | 1600×900 | 97.97 | 60.40 | 0.5174 |
PETRR50-p4 | ✗ | ✗ | ✗ | ✗ | ResNet | Attention | 1408×512 | 111.01 | 61.26 | 0.3665 |
PETRVoV-p4 | ✓ | ✗ | ✗ | ✗ | VoVNetV2 | Attention | 1600×900 | 100.69 | 65.03 | 0.4550 |
ORA3D | ✓ | ✗ | ✗ | ✗ | ResNet | Attention | 1600×900 | 99.17 | 68.63 | 0.4436 |
PolarFormerR101 | ✓ | ✗ | ✗ | ✗ | ResNet | Attention | 1600×900 | 96.06 | 70.88 | 0.4602 |
PolarFormerVoV | ✓ | ✗ | ✗ | ✗ | VoVNetV2 | Attention | 1600×900 | 98.75 | 67.51 | 0.4558 |
SRCN3DR101 | ✓ | ✗ | ✗ | ✗ | ResNet | CNN+Attn. | 1600×900 | 99.67 | 70.23 | 0.4286 |
SRCN3DVoV | ✓ | ✗ | ✗ | ✗ | VoVNetV2 | CNN+Attn. | 1600×900 | 102.04 | 67.95 | 0.4205 |
Sparse4DR101 | ✓ | ✓ | ✗ | ✗ | ResNet | CNN+Attn. | 1600×900 | 100.01 | 55.04 | 0.5438 |
BEVDetR50 | ✗ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 115.12 | 51.83 | 0.3770 |
BEVDetR101 | ✗ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 113.68 | 53.12 | 0.3877 |
BEVDetR101-pt | ✓ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 112.80 | 56.35 | 0.3780 |
BEVDetSwinT | ✗ | ✗ | ✓ | ✓ | Swin | CNN | 704×256 | 116.48 | 46.26 | 0.4037 |
BEVDepthR50 | ✗ | ✗ | ✓ | ✓ | ResNet | CNN | 704×256 | 110.02 | 56.82 | 0.4058 |
BEVerseSwinT | ✗ | ✗ | ✓ | ✓ | Swin | CNN | 704×256 | 137.25 | 28.24 | 0.1603 |
BEVerseSwinT | ✗ | ✓ | ✓ | ✓ | Swin | CNN | 704×256 | 110.67 | 48.60 | 0.4665 |
BEVerseSwinS | ✗ | ✗ | ✓ | ✓ | Swin | CNN | 1408×512 | 132.13 | 29.54 | 0.2682 |
BEVerseSwinS | ✗ | ✓ | ✓ | ✓ | Swin | CNN | 1408×512 | 117.82 | 49.57 | 0.4951 |
SOLOFusionshort | ✗ | ✓ | ✓ | ✗ | ResNet | CNN | 704×256 | 108.68 | 61.45 | 0.3907 |
SOLOFusionlong | ✗ | ✓ | ✓ | ✗ | ResNet | CNN | 704×256 | 97.99 | 64.42 | 0.4850 |
SOLOFusionfusion | ✗ | ✓ | ✓ | ✓ | ResNet | CNN | 704×256 | 92.86 | 64.53 | 0.5381 |
Note: Pretrain denotes models initialized from the FCOS3D checkpoint. Temporal indicates whether temporal information is used. Depth denotes models with an explicit depth estimation branch. CBGS highlight models use the class-balanced group-sampling strategy.
You can manage to create your own "RoboBEV" corrpution sets! Follow the instructions listed in CREATE.md.
If you find this work helpful, please kindly consider citing the following:
@article{xie2024benchmarking,
title = {Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving},
author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
journal = {arXiv preprint arXiv:2405.17426},
year = {2024}
}
@article{xie2023robobev,
title = {RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions},
author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
journal = {arXiv preprint arXiv:2304.06719},
year = {2023}
}
@misc{xie2023robobev_codebase,
title = {The RoboBEV Benchmark for Robust Bird's Eye View Detection under Common Corruption and Domain Shift},
author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
howpublished = {\url{https://github.com/Daniel-xsy/RoboBEV}},
year = {2023}
}
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, while some specific operations in this codebase might be with other licenses. Please refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.
This work is developed based on the MMDetection3D codebase.
MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.
:heart: We thank Jiangmiao Pang and Tai Wang for their insightful discussions and feedback. We thank the OpenDataLab platform for hosting our datasets.