YuanJianhao508 / RAG-Driver

A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-to-end driving
Apache License 2.0
75 stars 5 forks source link

RAG-Driver

Arxiv Project Page

Official GitHub repository for "RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model " accepted by Robotics: Science and Systems (RSS) 2024.

Highlights

RAG-Driver is a Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-to-end driving with strong zeroshot generalisation capacity.

News

TODO List

Usage

Requirements and Installation

Instruction Tuning on BDD-X dataset

bash ./scripts/finetune.sh

Evaluation

bash ./scripts/batch_inference.sh

Evaluate Caption Performance

Please download the following files (open-sourced by ADAPT), and extract all files under folder './evalcap' .

Then, run ''' python evaluate.py ''' with the prediction output file version stored in parameter 'version' in script.

Citations

If you find our paper and code useful in your research, please consider citing:

@article{yuan2024rag,
  title={RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model},
  author={Yuan, Jianhao and Sun, Shuyang and Omeiza, Daniel and Zhao, Bo and Newman, Paul and Kunze, Lars and Gadd, Matthew},
  journal={arXiv preprint arXiv:2402.10828},
  year={2024}
}
}

Acknowledgement

This repo is built on Video-LLaVA, ADAPT, and BDDX. We thank all the authors for their open-sourced codebase and data!