OpenVisualLab / NexusAD

MIT License
2 stars 2 forks source link

NexusAD Logo

# **πŸš— NexusAD** *Exploring the Nexus for Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving* **⚠️ Note: The code is currently being updated, stay tuned for more features and improvements.** `ECCV 2024 Autonomous Driving Workshop` **Corner Case Scene Understanding** [Leaderboard](https://coda-dataset.github.io/w-coda2024/track1/#Leaderboard) **W-CODA 2024 Challenge** [Track 1](https://coda-dataset.github.io/w-coda2024/track1/)

[![Team Page](https://img.shields.io/badge/Project%20Page-8A2BE2)](https://openvisuallab.github.io/) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](#license) [![OpenReview](https://img.shields.io/badge/OpenReview-LXZO1nGI0d-b31b1b.svg)](https://openreview.net/forum?id=LXZO1nGI0d) [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-NexusAD-orange)](https://huggingface.co/OpenVisualLab/NexusAD)

✍️ Authors


🌟 Project Highlights

NexusAD Architecture


πŸ“° Latest News


πŸš€ Quick Start

Follow these steps to start using NexusAD:

  1. Clone the repository:

    git clone https://github.com/OpenVisualLab/NexusAD.git
    cd NexusAD
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download the CODA-LM Dataset and place it in the specified directory.

  4. Download the LoRA Weights and place them in the weights/ directory.

  5. Run the model:

    python preprocess.py --data_path <path-to-CODA-LM>
    python train.py --config config.json
    python evaluate.py --data_path <path-to-evaluation-set>

βš™οΈ Model Architecture

The NexusAD model architecture consists of the following components:

  1. Preliminary Visual Perception: Uses Grounding DINO for object detection and DepthAnything v2 for depth estimation, transforming spatial information into easily understandable structured text.

  2. Scene-aware Enhanced Retrieval Generation: Utilizes Retrieval-Augmented Generation (RAG) to retrieve and select relevant samples, enhancing understanding of complex driving scenarios.

  3. Driving Prompt Optimization: Uses Chain-of-Thought (CoT) prompting to generate context-aware, structured driving suggestions.

  4. Fine-tuning: Efficient parameter fine-tuning is performed using LoRA to optimize performance while saving computational resources.


πŸ“Š Experimental Results

In the ECCV 2024 Corner Case Understanding task, NexusAD outperformed baseline models, achieving a final score of 68.97:

Model General Perception Regional Perception Driving Suggestions Final Score
GPT-4V 57.50 56.26 63.30 59.02
CODA-VLM 55.04 77.68 58.14 63.62
InternVL-2.0-26B 43.39 64.91 48.04 52.11
NexusAD (Ours) 57.58 84.31 65.02 68.97

πŸ’‘ Contribution Guidelines

We welcome all forms of contributions! Please refer to CONTRIBUTING.md for details on how to participate.


πŸ“œ License & Citation

This project is licensed under the MIT License. If you find this project helpful in your research, please cite it as follows:

@article{mo2024nexusad,
  title={NexusAD: Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving},
  author={Mo, Mengjingcheng and Wang, Jingxin and Wang, Like and Chen, Haosheng and Gu, Changjun and Leng, Jiaxu and Gao, Xinbo},
  journal={ECCV 2024 Autonomous Driving Workshop},
  year={2024}
}

πŸ™ Acknowledgments

Special thanks to the following projects for providing key references and support for the development of NexusAD:


(Back to top)