DLYuanGod / TinyGPT-V

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
BSD 3-Clause "New" or "Revised" License
1.24k stars 76 forks source link

BLIP-2 / Q-Former / Benchmarks #8

Open Jotschi opened 9 months ago

Jotschi commented 9 months ago

Hi,

the ViT and the Q-Former are frozen. What Q-Former Model is being used by TinyGPT-V? BLIP-2 has ha big zoo of models.

Furthermore what models were used to benchmark against blip-2?

Is there a source for the benchmark harness?

Thank you

DLYuanGod commented 9 months ago

Hello and thank you for your question.

In the results section of the paper, we cite results from other models in the MiniGPT report (including BLiP-2), and our Vision Backbone and MiniGPT-4 frameworks remain consistent.

For more detailed evaluation setups, you will need to consult the MiniGPT-4 or v2 papers in https://github.com/Vision-CAIR/MiniGPT-4.

Jotschi commented 9 months ago

Thanks. I found it in the code. blip2_pretrained_flant5xxl.pth is currently being used.

https://github.com/DLYuanGod/TinyGPT-V/blob/f433ff914ca457ec94be8aa0346f794b32be95a9/minigpt4/models/minigpt_v2.py#L70