OpenDriveLab / DriveLM

[ECCV 2024] DriveLM: Driving with Graph Visual Question Answering
https://opendrivelab.com/DriveLM/
Apache License 2.0
708 stars 41 forks source link
autonomous-driving chain-of-thought graph-of-thoughts large-language-models llm prompt-engineering prompting tree-of-thoughts vision-language

**DriveLM:** *Driving with **G**raph **V**isual **Q**uestion **A**nswering* `Autonomous Driving Challenge 2024` **Driving-with-Language** [Leaderboard](https://opendrivelab.com/challenge2024/#driving_with_language). Will re-open soon.
[![](https://img.shields.io/badge/Project%20Page-8A2BE2)](https://opendrivelab.com/DriveLM/) [![License: Apache2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](#licenseandcitation) [![arXiv](https://img.shields.io/badge/arXiv-2312.14150-b31b1b.svg)](https://arxiv.org/abs/2312.14150) [![](https://img.shields.io/badge/Latest%20release-v1.1-yellow)](#gettingstarted) [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DriveLM-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/spaces/AGC2024/driving-with-language-2024)

https://github.com/OpenDriveLab/DriveLM/assets/54334254/cddea8d6-9f6e-4e7e-b926-5afb59f8dce2

Highlights

πŸ”₯ We instantiate datasets (DriveLM-Data) built upon nuScenes and CARLA, and propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.

🏁 DriveLM serves as a main track in the CVPR 2024 Autonomous Driving Challenge. Everything you need for the challenge is HERE, including baseline, test data and submission format and evaluation pipeline!

News

Table of Contents

  1. Highlights
  2. Getting Started
  3. Current Endeavors and Future Horizons
  4. TODO List
  5. DriveLM-Data
  6. License and Citation
  7. Other Resources

Getting Started

To get started with DriveLM:

(back to top)

Current Endeavors and Future Directions

  • The advent of GPT-style multimodal models in real-world applications motivates the study of the role of language in driving.
  • Date below reflects the arXiv submission date.
  • If there is any missing work, please reach out to us!

DriveLM attempts to address some of the challenges faced by the community.

(back to top)

TODO List

(back to top)

DriveLM-Data

We facilitate the Perception, Prediction, Planning, Behavior, Motion tasks with human-written reasoning logic as a connection between them. We propose the task of GVQA on the DriveLM-Data.

πŸ“Š Comparison and Stats

DriveLM-Data is the first language-driving dataset facilitating the full stack of driving tasks with graph-structured logical dependencies.

Links to details about GVQA task, Dataset Features, and Annotation.

(back to top)

License and Citation

All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.

@article{sima2023drivelm,
  title={DriveLM: Driving with Graph Visual Question Answering},
  author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
  journal={arXiv preprint arXiv:2312.14150},
  year={2023}
}
@misc{contributors2023drivelmrepo,
  title={DriveLM: Driving with Graph Visual Question Answering},
  author={DriveLM contributors},
  howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
  year={2023}
}

(back to top)

Other Resources

Twitter Follow

OpenDriveLab

Twitter Follow

Autonomous Vision Group

(back to top)