This is "The One" project that OpenDriveLab
is committed to contribute to the community, providing some thought and general picture of how to embrace foundation models
into autonomous driving.
Simulated futures in a wide range of driving scenarios by Vista. Best viewed on demo page.
Quick facts:
OpenDV-YouTube
, nuScenes
@inproceedings{gao2024vista,
title={Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability},
author={Shenyuan Gao and Jiazhi Yang and Li Chen and Kashyap Chitta and Yihang Qiu and Andreas Geiger and Jun Zhang and Hongyang Li},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024}
}
@inproceedings{yang2024genad,
title={{Generalized Predictive Model for Autonomous Driving}},
author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
Examples of real-world driving scenarios in the OpenDV dataset, including urban, highway, rural scenes, etc.
🎦 The Largest Driving Video dataset to date, containing more than 1700 hours of real-world driving videos and being 300 times larger than the widely used nuScenes dataset.
mostly 1080P
) consume about 3 TB
storage space. However, these hour-long videos cannot be directly applied for model training as they are extremely memory consuming.24 TB
storage space in total.OpenDV-YouTube
. The raw videos consume about 44 GB
of storage space and the processed images will consume about 390 GB
of storage space.Quick facts:
YouTube
, with careful collection and filtering process.Accepted at CVPR 2024, Highlight
Note
: Annotations for other public datasets in OpenDV-2K will not be released since we randomly sampled a subset of them in training, which are incomplete and hard to trace back to their origins (i.e., file name). Nevertheless, it's easy to reproduce the collection and annotation process on your own following our paper.@inproceedings{yang2024genad,
title={Generalized Predictive Model for Autonomous Driving},
author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Introducing the First benchmark on Language Prompt for Driving.
Quick facts:
nuScenes
, CARLA (To be released)
Title | Host | Year | Task | Entry |
---|---|---|---|---|
Autonomous Driving Challenge | OpenDriveLab | CVPR2023 | Perception / OpenLane Topology | 111 |
Perception / Online HD Map Construction | ||||
Perception / 3D Occupancy Prediction | ||||
Prediction & Planning / nuPlan Planning | ||||
Waymo Open Dataset Challenges | Waymo | CVPR2023 | Perception / 2D Video Panoptic Segmentation | 35 |
Perception / Pose Estimation | ||||
Prediction / Motion Prediction | ||||
Prediction / Sim Agents | ||||
CVPR2022 | Prediction / Motion Prediction | 128 | ||
Prediction / Occupancy and Flow Prediction | ||||
Perception / 3D Semantic Segmentation | ||||
Perception / 3D Camera-only Detection | ||||
CVPR2021 | Prediction / Motion Prediction | 115 | ||
Prediction / Interaction Prediction | ||||
Perception / Real-time 3D Detection | ||||
Perception / Real-time 2D Detection | ||||
Argoverse Challenges | Argoverse | CVPR2023 | Prediction / Multi-agent Forecasting | 81 |
Perception & Prediction / Unified Sensorbased Detection, Tracking, and Forecasting | ||||
Perception / LiDAR Scene Flow | ||||
Prediction / 3D Occupancy Forecasting | ||||
CVPR2022 | Perception / 3D Object Detection | 81 | ||
Prediction / Motion Forecasting | ||||
Perception / Stereo Depth Estimation | ||||
CVPR2021 | Perception / Stereo Depth Estimation | 368 | ||
Prediction / Motion Forecasting | ||||
Perception / Streaming 2D Detection | ||||
CARLA Autonomous Driving Challenge | CARLA Team, Intel | 2023 | Planning / CARLA AD Challenge 2.0 | - |
NeurIPS2022 | Planning / CARLA AD Challenge 1.0 | 19 | ||
NeurIPS2021 | Planning / CARLA AD Challenge 1.0 | - | ||
粤港澳大湾区 (黄埔)国际算法算例大赛 | 琶洲实验室 | 2023 | 感知 / 跨场景单目深度估计 | - |
感知 / 路侧毫米波雷达标定和目标跟踪 | - | |||
2022 | 感知 / 路侧三维感知算法 | - | ||
感知 / 街景图像店面招牌文字识别 | - | |||
AI Driving Olympics | ETH Zurich, University of Montreal,Motional | NeurIP2021 | Perception / nuScenes Panoptic | 11 |
ICRA2021 | Perception / nuScenes Detection | 456 | ||
Perception / nuScenes Tracking | ||||
Prediction / nuScenes Prediction | ||||
Perception / nuScenes LiDAR Segmentation | ||||
计图 (Jittor)人工智能算法挑战赛 | 国家自然科学基金委信息科学部 | 2021 | 感知 / 交通标志检测 | 37 |
KITTI Vision Benchmark Suite | University of Tübingen | 2012 | Perception / Stereo, Flow, Scene Flow, Depth, Odometry, Object, Tracking, Road, Semantics | 5,610 |
Dataset | Year | Diversity | Sensor | Annotation | Paper | ||||
---|---|---|---|---|---|---|---|---|---|
Scenes | Hours | Region | Camera | Lidar | Other | ||||
KITTI | 2012 | 50 | 6 | EU | Font-view | ✗ | GPS & IMU | 2D BBox & 3D BBox | Link |
Cityscapes | 2016 | - | - | EU | Font-view | ✗ | 2D Seg | Link | |
Lost and Found | 2016 | 112 | - | - | Font-view | ✗ | 2D Seg | Link | |
Mapillary | 2016 | - | - | Global | Street-view | ✗ | 2D Seg | Link | |
DDD17 | 2017 | 36 | 12 | EU | Front-view | ✗ | GPS & CAN-bus & Event Camera | - | Link |
Apolloscape | 2016 | 103 | 2.5 | AS | Front-view | ✗ | GPS & IMU | 3D BBox & 2D Seg | Link |
BDD-X | 2018 | 6984 | 77 | NA | Front-view | ✗ | Language | Link | |
HDD | 2018 | - | 104 | NA | Front-view | ✓ | GPS & IMU & CAN-bus | 2D BBox | Link |
IDD | 2018 | 182 | - | AS | Front-view | ✗ | 2D Seg | Link | |
SemanticKITTI | 2019 | 50 | 6 | EU | ✗ | ✓ | 3D Seg | Link | |
Woodscape | 2019 | - | - | Global | 360° | ✓ | GPS & IMU & CAN-bus | 3D BBox & 2D Seg | Link |
DrivingStereo | 2019 | 42 | - | AS | Front-view | ✓ | - | Link | |
Brno-Urban | 2019 | 67 | 10 | EU | Front-view | ✓ | GPS & IMU & Infrared Camera | - | Link |
A*3D | 2019 | - | 55 | AS | Front-view | ✓ | 3D BBox | Link | |
Talk2Car | 2019 | 850 | 283.3 | NA | Front-view | ✓ | Language & 3D BBox | Link | |
Talk2Nav | 2019 | 10714 | - | Sim | 360° | ✗ | Language | Link | |
PIE | 2019 | - | 6 | NA | Front-view | ✗ | 2D BBox | Link | |
UrbanLoco | 2019 | 13 | - | AS & NA | 360° | ✓ | IMU | - | Link |
TITAN | 2019 | 700 | - | AS | Front-view | ✗ | 2D BBox | Link | |
H3D | 2019 | 160 | 0.77 | NA | Front-view | ✓ | GPS & IMU | - | Link |
A2D2 | 2020 | - | 5.6 | EU | 360° | ✓ | GPS & IMU & CAN-bus | 3D BBox & 2D Seg | Link |
CARRADA | 2020 | 30 | 0.3 | NA | Front-view | ✗ | Radar | 3D BBox | Link |
DAWN | 2019 | - | - | Global | Front-view | ✗ | 2D BBox | Link | |
4Seasons | 2019 | - | - | - | Front-view | ✗ | GPS & IMU | - | Link |
UNDD | 2019 | - | - | - | Front-view | ✗ | 2D Seg | Link | |
SemanticPOSS | 2020 | - | - | AS | ✗ | ✓ | GPS & IMU | 3D Seg | Link |
Toronto-3D | 2020 | 4 | - | NA | ✗ | ✓ | 3D Seg | Link | |
ROAD | 2021 | 22 | - | EU | Front-view | ✗ | 2D BBox & Topology | Link | |
Reasonable Crowd | 2021 | - | - | Sim | Front-view | ✗ | Language | Link | |
METEOR | 2021 | 1250 | 20.9 | AS | Front-view | ✗ | GPS | Language | Link |
PandaSet | 2021 | 179 | - | NA | 360° | ✓ | GPS & IMU | 3D BBox | Link |
MUAD | 2022 | - | - | Sim | 360° | ✓ | 2D Seg& 2D BBox | Link | |
TAS-NIR | 2022 | - | - | - | Front-view | ✗ | Infrared Camera | 2D Seg | Link |
LiDAR-CS | 2022 | 6 | - | Sim | ✗ | ✓ | 3D BBox | Link | |
WildDash | 2022 | - | - | - | Front-view | ✗ | 2D Seg | Link | |
OpenScene | 2023 | 1000 | 5.5 | AS & NA | 360° | ✗ | 3D Occ | Link | |
ZOD | 2023 | 1473 | 8.2 | EU | 360° | ✓ | GPS & IMU & CAN-bus | 3D BBox & 2D Seg | Link |
nuScenes | 2019 | 1000 | 5.5 | AS & NA | 360° | ✓ | GPS & CAN-bus & Radar & HDMap | 3D BBox & 3D Seg | Link |
Argoverse V1 | 2019 | 324k | 320 | NA | 360° | ✓ | HDMap | 3D BBox & 3D Seg | Link |
Waymo | 2019 | 1000 | 6.4 | NA | 360° | ✓ | 2D BBox & 3D BBox | Link | |
KITTI-360 | 2020 | 366 | 2.5 | EU | 360° | ✓ | 3D BBox & 3D Seg | Link | |
ONCE | 2021 | - | 144 | AS | 360° | ✓ | 3D BBox | Link | |
nuPlan | 2021 | - | 120 | AS & NA | 360° | ✓ | 3D BBox | Link | |
Argoverse V2 | 2022 | 1000 | 4 | NA | 360° | ✓ | HDMap | 3D BBox | Link |
DriveLM | 2023 | 1000 | 5.5 | AS & NA | 360° | ✗ | Language | Link | |
Dataset | Year | Diversity | Sensor | Annotation | Paper | |||||
---|---|---|---|---|---|---|---|---|---|---|
Scenes | Frames | Camera | Lidar | Type | Space | Inst. | Track | |||
Caltech Lanes | 2008 | 4 | 1224/1224 | ✗ | PV | ✓ | ✗ | Link | ||
VPG | 2017 | - | 20K/20K | ✗ | PV | ✗ | - | Link | ||
TUsimple | 2017 | 6.4K | 6.4K/128K | ✗ | PV | ✓ | ✗ | Link | ||
CULane | 2018 | - | 133K/133K | ✗ | PV | ✓ | - | Link | ||
ApolloScape | 2018 | 235 | 115K/115K | ✓ | PV | ✗ | ✗ | Link | ||
LLAMAS | 2019 | 14 | 79K/100K | Front-view Image | ✗ | Laneline | PV | ✓ | ✗ | Link |
3D Synthetic | 2020 | - | 10K/10K | ✗ | PV | ✓ | - | Link | ||
CurveLanes | 2020 | - | 150K/150K | ✗ | PV | ✓ | - | Link | ||
VIL-100 | 2021 | 100 | 10K/10K | ✗ | PV | ✓ | ✗ | Link | ||
OpenLane-V1 | 2022 | 1K | 200K/200K | ✗ | 3D | ✓ | ✓ | Link | ||
ONCE-3DLane | 2022 | - | 211K/211K | ✗ | 3D | ✓ | - | Link | ||
OpenLane-V2 | 2023 | 2K | 72K/72K | Multi-view Image | ✗ | Lane Centerline, Lane Segment | 3D | ✓ | ✓ | Link |
Subtask | Input | Output | Evaluation | Dataset |
---|---|---|---|---|
Motion Prediction | Surrounding Traffic States | Spatiotemporal Trajectories of Single/Multiple Vehicle(s) | Displacement Error | Argoverse |
nuScenes | ||||
Waymo | ||||
Interaction | ||||
MONA | ||||
Trajectory Planning | Motion States for Ego Vehicles, Scenario Cognition and Prediction | Trajectories for Ego Vehicles | Displacement Error, Safety, Compliance, Comfort | nuPlan |
CARLA | ||||
MetaDrive | ||||
Apollo | ||||
Path Planning | Maps for Road Network | Routes Connecting to Nodes and Links | Efficiency, Energy Conservation | OpenStreetMap |
Transportation Networks | ||||
DTAlite | ||||
PeMS | ||||
New York City Taxi Data |