InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.47k stars 153 forks source link

InternLM-XComposer2-VL和InternLM-XComposer2 这两个模型的区别 #200

Closed liuheng0111 closed 6 months ago

liuheng0111 commented 7 months ago

InternLM-XComposer2-VL和InternLM-XComposer2 这两个模型的区别是什么?InternLM-XComposer2-VL是Multi-task Training阶段出来的模型么?

panzhang0212 commented 7 months ago

InternLM-XComposer2-VL: for VL benchmarks and AI assistant. It ranks as the most powerful vision-language model based on 7B-parameter level LLMs, leading across 13 benchmarks.

InternLM-XComposer2 : The further instruction tuned VLLM for Interleaved Text-Image Composition(图文创作) with free-form inputs.

shams2023 commented 7 months ago

InternLM-XComposer2-VL:用于 VL 基准测试和 AI 助手。它被评为基于 7B 参数级别 LLM 的最强大的视觉语言模型,在 13 个基准测试中处于领先地位。

InternLM-XComposer2:进一步的指令调整了 VLLM,用于具有自由格式输入的交错文本图像合成(图文创作)。

So if I want to obtain a textual description of the image (i.e. perform image captioning tasks), then I should use the InternLM-XComposer2-VL model, right? (所以如果我想要获得图像的文本描述(即 执行image caption 任务),那么我就该InternLM-XComposer2-VL模型,是的吗? )

LightDXY commented 6 months ago

是的,InternLM-XComposer2-VL更适合这个任务

guikunchen commented 6 months ago

@LightDXY @panzhang0212 请问从InternLM-XComposer2-VL得到InternLM-XComposer2的训练应该怎么做呢? 是采用https://github.com/InternLM/InternLM-XComposer/blob/main/finetune 这里的 code吗? 如果是的话,1. 指定 pretrained path 为 VL 版本的 path。2. image size 设置成 224 还是 490?不太确定为什么两个版本要特地区分 image size。感谢🙏