DriveAGI

This is "The One" project that OpenDriveLab is committed to contribute to the community, providing some thought and general picture of how to embrace foundation models into autonomous driving.

NEWS
At A Glance
Vista 🚀
(GenAD Dataset) OpenDV-YouTube 🔥
DriveData Survey
- Abstract
- Related Work Collection
DriveLM 🔥
OpenScene
OpenLane-V2 Update
NEWS

2024/05/28 We released our latest research, Vista, a generalizable driving world model. It's capable of predicting high-fidelity and long-horizon futures, executing multi-modal actions, and serving as a generalizable reward function to assess driving behaviors.

2024/03/24 OpenDV-YouTube Update: Full suite of toolkits for OpenDV-YouTube is now available, including data downloading and processing scripts, as well as language annotations. Please refer to OpenDV-YouTube.

2024/03/15 We released the complete video list of OpenDV-YouTube, a large-scale driving video dataset, for GenAD project. Data downloading and processing script, as well as language annotations, will be released next week. Stay tuned.

2024/01/24 We are excited to announce some update to our survey and would like to thank John Lambert, Klemens Esterle from the public community for their advice to improve the manuscript.

At A Glance

Here are some key components to construct a large foundation model curated for an autonomous system.

overview

Below we would like to share the latest update from our team on the DriveData side. We will release the detail of the DriveEngine and the DriveAGI in the future.

Vista

A Generalizable Driving World Model with High Fidelity and Versatile Controllability 🌏

Quick facts:

Introducing the world's first generalizable driving world model.
Task: High-fidelity, action-conditioned, and long-horizon future prediction for driving scenes in the wild.
Dataset: OpenDV-YouTube, nuScenes
Code and model: https://github.com/OpenDriveLab/Vista
Video Demo: https://vista-demo.github.io
Related work: Vista, GenAD

@article{gao2024vista,
 title={Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability}, 
 author={Shenyuan Gao and Jiazhi Yang and Li Chen and Kashyap Chitta and Yihang Qiu and Andreas Geiger and Jun Zhang and Hongyang Li},
 journal={arXiv preprint arXiv:2405.17398},
 year={2024}
}

@inproceedings{yang2024genad,
  title={{Generalized Predictive Model for Autonomous Driving}},
  author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

(GenAD Dataset) OpenDV-YouTube

opendv

Generalized Predictive Model for Autonomous Driving (CVPR'24, Highlight ⭐)

Paper | Video | Poster | Slides

The Largest Driving Video dataset to date, containing more than 1700 hours of real-world driving videos and being 300 times larger than the widely used nuScenes dataset.

Complete video list (under YouTube license): OpenDV Videos.
- The downloaded raw videos (mostly 1080P) consumes about 3 TB storage space. However, these hour-long videos cannot be directly applied for model training as they are extremely memory consuming.
- Therefore, we preprocess them into conseductive images which are more flexible and efficient to load during training. Processed images consumes about 24 TB storage space in total.
- It's recommended to set up your experiments on a small subset, say 1/20 of the whole dataset. After stablizing the training, you can then apply your method on the whole dataset and hope for the best 🤞.
Step-by-step instruction for data preparation: OpenDV-YouTube.
Language annotation for OpenDV-YouTube: OpenDV-YouTube-Language.

Quick facts:

Task: large-scale video prediction for driving scenes.
Data source: YouTube, with careful collection and filtering process.
Diversity Highlights: 1700 hours of driving videos, covering more than 244 cities in 40 countries.
Related work: GenAD Accepted at CVPR 2024, Highlight
Note: Annotations for other public datasets in OpenDV-2K will not be released since we randomly sampled a subset of them in training, which are incomplete and hard to trace back to their origins (i.e., file name). Nevertheless, it's easy to reproduce the collection and annotation process on your own following our paper.

@inproceedings{yang2024genad,
  title={Generalized Predictive Model for Autonomous Driving},
  author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

DriveData Survey

Abstract

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. In this survey, we provide a comprehensive analysis of more than 70 papers on the timeline, impact, challenges, and future trends in autonomous driving dataset.

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

English Version

Chinese Version Accepted at SCIENTIA SINICA Informationis (中文版)

@article{li2024_driving_dataset_survey,
  title = {Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future},
  author = {Hongyang Li and Yang Li and Huijie Wang and Jia Zeng and Huilin Xu and Pinlong Cai and Li Chen and Junchi Yan and Feng Xu and Lu Xiong and Jingdong Wang and Futang Zhu and Chunjing Xu and Tiancai Wang and Fei Xia and Beipeng Mu and Zhihui Peng and Dahua Lin and Yu Qiao},
  journal = {SCIENTIA SINICA Informationis},
  year = {2024},
  doi = {10.1360/SSI-2023-0313}
}

overview

Current autonomous driving datasets can broadly be categorized into two generations since the 2010s. We define the Impact (y-axis) of a dataset based on sensor configuration, input modality, task category, data scale, ecosystem, etc.

overview

Related Work Collection

We present comprehensive paper collections, leaderboards, and challenges.(Click to expand)

Challenges and Leaderboards

Title	Host	Year	Task	Entry
Autonomous Driving Challenge	OpenDriveLab	CVPR2023	Perception / OpenLane Topology	111

			Perception / Online HD Map Construction

			Perception / 3D Occupancy Prediction

			Prediction & Planning / nuPlan Planning
Waymo Open Dataset Challenges	Waymo	CVPR2023	Perception / 2D Video Panoptic Segmentation	35

			Perception / Pose Estimation

			Prediction / Motion Prediction

			Prediction / Sim Agents

		CVPR2022	Prediction / Motion Prediction	128

			Prediction / Occupancy and Flow Prediction

			Perception / 3D Semantic Segmentation

			Perception / 3D Camera-only Detection

		CVPR2021	Prediction / Motion Prediction	115

			Prediction / Interaction Prediction

			Perception / Real-time 3D Detection

			Perception / Real-time 2D Detection
Argoverse Challenges	Argoverse	CVPR2023	Prediction / Multi-agent Forecasting	81

			Perception & Prediction / Unified Sensorbased Detection, Tracking, and Forecasting

			Perception / LiDAR Scene Flow

			Prediction / 3D Occupancy Forecasting

		CVPR2022	Perception / 3D Object Detection	81

			Prediction / Motion Forecasting

			Perception / Stereo Depth Estimation

		CVPR2021	Perception / Stereo Depth Estimation	368

			Prediction / Motion Forecasting

			Perception / Streaming 2D Detection
CARLA Autonomous Driving Challenge	CARLA Team, Intel	2023	Planning / CARLA AD Challenge 2.0	-
		2023		-
		NeurIPS2022	Planning / CARLA AD Challenge 1.0	19
		NeurIPS2022		19
		NeurIPS2021	Planning / CARLA AD Challenge 1.0	-
粤港澳大湾区（黄埔）国际算法算例大赛	琶洲实验室	2023	感知 / 跨场景单目深度估计	-

			感知 / 路侧毫米波雷达标定和目标跟踪	-

		2022	感知 / 路侧三维感知算法	-

			感知 / 街景图像店面招牌文字识别	-
AI Driving Olympics	ETH Zurich, University of Montreal,Motional	NeurIP2021	Perception / nuScenes Panoptic	11

		ICRA2021	Perception / nuScenes Detection	456

			Perception / nuScenes Tracking

			Prediction / nuScenes Prediction

			Perception / nuScenes LiDAR Segmentation
计图 (Jittor)人工智能算法挑战赛	国家自然科学基金委信息科学部	2021	感知 / 交通标志检测	37
KITTI Vision Benchmark Suite	University of Tübingen	2012	Perception / Stereo, Flow, Scene Flow, Depth, Odometry, Object, Tracking, Road, Semantics	5,610

(back to top)

Perception Datasets

Dataset	Year	Diversity			Sensor			Annotation	Paper

		Scenes	Hours	Region	Camera	Lidar	Other
KITTI	2012	50	6	EU	Font-view	✗	GPS & IMU	2D BBox & 3D BBox	Link
Cityscapes	2016	-	-	EU	Font-view	✗		2D Seg	Link
Lost and Found	2016	112	-	-	Font-view	✗		2D Seg	Link
Mapillary	2016	-	-	Global	Street-view	✗		2D Seg	Link
DDD17	2017	36	12	EU	Front-view	✗	GPS & CAN-bus & Event Camera	-	Link
Apolloscape	2016	103	2.5	AS	Front-view	✗	GPS & IMU	3D BBox & 2D Seg	Link
BDD-X	2018	6984	77	NA	Front-view	✗		Language	Link
HDD	2018	-	104	NA	Front-view	✓	GPS & IMU & CAN-bus	2D BBox	Link
IDD	2018	182	-	AS	Front-view	✗		2D Seg	Link
SemanticKITTI	2019	50	6	EU	✗	✓		3D Seg	Link
Woodscape	2019	-	-	Global	360°	✓	GPS & IMU & CAN-bus	3D BBox & 2D Seg	Link
DrivingStereo	2019	42	-	AS	Front-view	✓		-	Link
Brno-Urban	2019	67	10	EU	Front-view	✓	GPS & IMU & Infrared Camera	-	Link
A*3D	2019	-	55	AS	Front-view	✓		3D BBox	Link
Talk2Car	2019	850	283.3	NA	Front-view	✓		Language & 3D BBox	Link
Talk2Nav	2019	10714	-	Sim	360°	✗		Language	Link
PIE	2019	-	6	NA	Front-view	✗		2D BBox	Link
UrbanLoco	2019	13	-	AS & NA	360°	✓	IMU	-	Link
TITAN	2019	700	-	AS	Front-view	✗		2D BBox	Link
H3D	2019	160	0.77	NA	Front-view	✓	GPS & IMU	-	Link
A2D2	2020	-	5.6	EU	360°	✓	GPS & IMU & CAN-bus	3D BBox & 2D Seg	Link
CARRADA	2020	30	0.3	NA	Front-view	✗	Radar	3D BBox	Link
DAWN	2019	-	-	Global	Front-view	✗		2D BBox	Link
4Seasons	2019	-	-	-	Front-view	✗	GPS & IMU	-	Link
UNDD	2019	-	-	-	Front-view	✗		2D Seg	Link
SemanticPOSS	2020	-	-	AS	✗	✓	GPS & IMU	3D Seg	Link
Toronto-3D	2020	4	-	NA	✗	✓		3D Seg	Link
ROAD	2021	22	-	EU	Front-view	✗		2D BBox & Topology	Link
Reasonable Crowd	2021	-	-	Sim	Front-view	✗		Language	Link
METEOR	2021	1250	20.9	AS	Front-view	✗	GPS	Language	Link
PandaSet	2021	179	-	NA	360°	✓	GPS & IMU	3D BBox	Link
MUAD	2022	-	-	Sim	360°	✓		2D Seg& 2D BBox	Link
TAS-NIR	2022	-	-	-	Front-view	✗	Infrared Camera	2D Seg	Link
LiDAR-CS	2022	6	-	Sim	✗	✓		3D BBox	Link
WildDash	2022	-	-	-	Front-view	✗		2D Seg	Link
OpenScene	2023	1000	5.5	AS & NA	360°	✗		3D Occ	Link
ZOD	2023	1473	8.2	EU	360°	✓	GPS & IMU & CAN-bus	3D BBox & 2D Seg	Link
nuScenes	2019	1000	5.5	AS & NA	360°	✓	GPS & CAN-bus & Radar & HDMap	3D BBox & 3D Seg	Link
Argoverse V1	2019	324k	320	NA	360°	✓	HDMap	3D BBox & 3D Seg	Link
Waymo	2019	1000	6.4	NA	360°	✓		2D BBox & 3D BBox	Link
KITTI-360	2020	366	2.5	EU	360°	✓		3D BBox & 3D Seg	Link
ONCE	2021	-	144	AS	360°	✓		3D BBox	Link
nuPlan	2021	-	120	AS & NA	360°	✓		3D BBox	Link
Argoverse V2	2022	1000	4	NA	360°	✓	HDMap	3D BBox	Link
DriveLM	2023	1000	5.5	AS & NA	360°	✗		Language	Link

(back to top)

Mapping Datasets

Dataset	Year	Diversity		Sensor		Annotation				Paper

		Scenes	Frames	Camera	Lidar	Type	Space	Inst.	Track
Caltech Lanes	2008	4	1224/1224		✗		PV	✓	✗	Link
VPG	2017	-	20K/20K		✗		PV	✗	-	Link
TUsimple	2017	6.4K	6.4K/128K		✗		PV	✓	✗	Link
CULane	2018	-	133K/133K		✗		PV	✓	-	Link
ApolloScape	2018	235	115K/115K		✓		PV	✗	✗	Link
LLAMAS	2019	14	79K/100K	Front-view Image	✗	Laneline	PV	✓	✗	Link
3D Synthetic	2020	-	10K/10K		✗		PV	✓	-	Link
CurveLanes	2020	-	150K/150K		✗		PV	✓	-	Link
VIL-100	2021	100	10K/10K		✗		PV	✓	✗	Link
OpenLane-V1	2022	1K	200K/200K		✗		3D	✓	✓	Link
ONCE-3DLane	2022	-	211K/211K		✗		3D	✓	-	Link
OpenLane-V2	2023	2K	72K/72K	Multi-view Image	✗	Lane Centerline, Lane Segment	3D	✓	✓	Link

Prediction and Planning Datasets

Subtask	Input	Output	Evaluation	Dataset
Motion Prediction	Surrounding Traffic States	Spatiotemporal Trajectories of Single/Multiple Vehicle(s)	Displacement Error	Argoverse

				nuScenes

				Waymo

				Interaction

				MONA
Trajectory Planning	Motion States for Ego Vehicles, Scenario Cognition and Prediction	Trajectories for Ego Vehicles	Displacement Error, Safety, Compliance, Comfort	nuPlan

				CARLA

				MetaDrive

				Apollo
Path Planning	Maps for Road Network	Routes Connecting to Nodes and Links	Efficiency, Energy Conservation	OpenStreetMap

				Transportation Networks

				DTAlite

				PeMS

				New York City Taxi Data

Below we would like to share the latest update from our team on the DriveData side. We will release the detail of the DriveEngine and the DriveAGI in the future.

DriveLM

Introducing the First benchmark on Language Prompt for Driving.

Quick facts:

Task: given the language prompts as input, predict the trajectory in the scene
Origin dataset: nuScenes, CARLA (To be released)
Repo: https://github.com/OpenDriveLab/DriveLM, https://github.com/OpenDriveLab/ELM
Related work: DriveLM, ELM
Related challenge: Driving with Language AGC Challenge 2024

OpenScene

The Largest up-to-date 3D Occupancy Forecasting dataset for visual pre-training.

Quick facts:

Task: given the large amount of data, predict the 3D occupancy in the environment.
Origin dataset: nuPlan
Repo: https://github.com/OpenDriveLab/OpenScene
Related work: OccNet
Related challenge: 3D Occupancy Prediction Challenge 2023, Occupancy and Flow AGC Challenge 2024, Predictive World Model AGC Challenge 2024

OpenLane-V2 Update

Flourishing OpenLane-V2 with Standard Definition (SD) Map and Map Elements.

Quick facts:

Task: given multi-view images and SD-map (also known as ADAS map) as input, build the driving scene on the fly without the aid of HD-map.
Repo: https://github.com/OpenDriveLab/OpenLane-V2
Related work: OpenLane-V2, TopoNet, LaneSegNet
Related challenge: Lane Topology Challenge 2023, Mapless Driving AGC Challenge 2024

OpenDriveLab / DriveAGI

readme

DriveAGI

Table of Contents

NEWS

At A Glance

Vista

A Generalizable Driving World Model with High Fidelity and Versatile Controllability 🌏

(GenAD Dataset) OpenDV-YouTube

Generalized Predictive Model for Autonomous Driving (CVPR'24, Highlight ⭐)

Paper | Video | Poster | Slides

DriveData Survey

Abstract

Related Work Collection

DriveLM

OpenScene

OpenLane-V2 Update