EMZucas / minidrive

53 stars 2 forks source link

MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving

📑 arxiv link : https://arxiv.org/pdf/2409.07267

We are preparing for open-source release.

Citation

To cite our work, please use the following BibTeX entry:

@article{zhang2024minidrive,
  title={MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving},
  author={Zhang, Enming and Dai, Xingyuan and Lv, Yisheng and Miao, Qinghai},
  journal={arXiv preprint arXiv:2409.07267},
  year={2024}
}