Thanks for your great work! I'm really interested in your work and have some questions here~
InternLM-XComposer2-VL-7B 🤗 : The multi-task trained VLLM model with InternLM-7B as the initialization of the LLM for VL benchmarks and AI assistant. It ranks as the most powerful vision-language model based on 7B-parameter level LLMs, leading across 13 benchmarks.
InternLM-XComposer2-7B 🤗: The further instruction tuned VLLM for Interleaved Text-Image Composition with free-form inputs.
The InternLM-XComposer2-7B obviously indicates the format like:
Hello! <imageA> What's in the image, and in <imageB>, what can you see?
It's the free-form inputs.
But when to use the InternLM-XComposer2-VL-7B? With the image-only inputs with some benchmarks, for image comprehension without text inputs?
I'm confused about the input format.
So, the questions here:
The input format for InternLM-XComposer2 and InternLM-XComposer2-VL, especially for the VL version, can you give some situations?
The boundary between the two versions of InternLM-XComposer2, if I select the wrong version, could it greatly damage the result? (for example, use InternLM-XComposer2 on benchmark testing, or use the VL version on an interleaved situation)
Is the InternLM-XComposer2 training based on InternLM-XComposer2-VL with extra interleaved data for instruct tuning? And how to train the VL version of InternLM-XComposer2?
Thanks for your great work! I'm really interested in your work and have some questions here~
The
InternLM-XComposer2-7B
obviously indicates the format like:It's the free-form inputs.
But when to use the
InternLM-XComposer2-VL-7B
? With the image-only inputs with some benchmarks, for image comprehension without text inputs? I'm confused about the input format.So, the questions here:
InternLM-XComposer2
andInternLM-XComposer2-VL
, especially for theVL
version, can you give some situations?InternLM-XComposer2
, if I select the wrong version, could it greatly damage the result? (for example, useInternLM-XComposer2
on benchmark testing, or use theVL
version on an interleaved situation)InternLM-XComposer2
training based onInternLM-XComposer2-VL
with extra interleaved data for instruct tuning? And how to train theVL
version ofInternLM-XComposer2
?Thanks for your great work again~