HJYao00 / DenseConnector

【NeurIPS 2024】Dense Connector for MLLMs
Apache License 2.0
133 stars 5 forks source link

【NeurIPS 2024】Dense Connector for MLLMs

[![zhihu](https://img.shields.io/badge/-知乎-000000?logo=zhihu&logoColor=0084FF)](https://zhuanlan.zhihu.com/p/700000183) [Huanjin Yao](https://scholar.google.com/citations?user=pDtsCBQAAAAJ&hl=zh-CN)1,3*, [Wenhao Wu](https://whwu95.github.io/)2*✉️, [Taojiannan Yang]()4, [Yuxin Song]()3, [Mengxi Zhang](https://scholar.google.com/citations?user=73tAoEAAAAAJ&hl=en)3, [Haocheng Feng]()3, [Yifan Sun]()3, [Zhiheng Li](https://www.sigs.tsinghua.edu.cn/lzh/main.htm)1, [Wanli Ouyang](https://wlouyang.github.io/)5, [Jingdong Wang](https://jingdongwang2017.github.io/)3 1[Tsinghua University](https://www.tsinghua.edu.cn/en/), 2[The University of Sydney](https://www.sydney.edu.au/), 3[Baidu](https://vis.baidu.com/#/), 4[AWS AI Labs](https://aws.amazon.com/ai/), 5[CUHK](https://www.cuhk.edu.hk/english/index.html#) *Equal Contribution, ✉️Corresponding Author

News

Contents

Overview

We introduce the Dense Connector - a simple, effective, and plug-and-play vision-language connector that significantly enhances existing MLLMs by leveraging multi-layer visual features, with minimal additional computational overhead! We hope that this work will provide valuable experience and serve as a basic module for future MLLM development!

image

The Dense Connector utilizes multi-layer visual features to enhance visual representation and augment the visual perception capabilities of the Multimodal Large Language Models (MLLMs) which can be easily integrated into the current MLLMs. We provide three instantiation methods of Dense Connector: Sparse Token Integration (STI), Sparse Channel Integration (SCI), and Dense Channel Integration (DCI). The Dense Channel Integration achieves the best results.

image

Installation

Please follow the instructions below to install the required packages.

  1. Clone this repository

    git clone https://github.com/HJYao00/DenseConnector.git
    cd DenseConnector
  2. Install Package

    conda create -n dc python=3.10 -y
    conda activate dc
    cd DenseConnector
    pip install --upgrade pip 
    pip install -e .
  3. Install additional packages for training Dense Connector

    pip install ninja
    pip install flash-attn --no-build-isolation

Dataset Preparation and Training

Please refer to the document for dataset preparation and training.

Evaluation

We evaluate the Dense Connector across 19 diverse benchmarks, including 11 image benchmarks and 8 video benchmarks. The testing procedures for both images and videos can be found here.

Model Zoo

Please visit our Model Zoo to access all publicly available Dense Connector checkpoints. We scale the LLM from 2.7B to 70B, incorporating the latest open-source large language model, Llama3-8B-Instruct & Llama3-70B-Instruct

Dialogue Example

We provide several dialogue examples, with additional results available in the paper.

image

Citation

If you find this repository is useful, please consider star🌟 this repo and cite🖇️ our paper.

@article{yao2024dense,
  title={Dense Connector for MLLMs},
  author={Yao, Huanjin and Wu, Wenhao and Yang, Taojiannan and Song, YuXin and Zhang, Mengxi and Feng, Haocheng and Sun, Yifan and Li, Zhiheng and Ouyang, Wanli and Wang, Jingdong},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}

Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA, Mini-Gemini and FreeVA.