InternLM-XComposer2-VL和InternLM-XComposer2 这两个模型的区别

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Apache License 2.0

2.47k stars 153 forks source link

InternLM-XComposer2-VL和InternLM-XComposer2 这两个模型的区别 #200

Closed liuheng0111 closed 6 months ago

liuheng0111 commented 7 months ago

InternLM-XComposer2-VL和InternLM-XComposer2 这两个模型的区别是什么？InternLM-XComposer2-VL是Multi-task Training阶段出来的模型么？

panzhang0212 commented 7 months ago

InternLM-XComposer2-VL: for VL benchmarks and AI assistant. It ranks as the most powerful vision-language model based on 7B-parameter level LLMs, leading across 13 benchmarks.

InternLM-XComposer2 : The further instruction tuned VLLM for Interleaved Text-Image Composition(图文创作) with free-form inputs.

shams2023 commented 7 months ago

InternLM-XComposer2-VL：用于 VL 基准测试和 AI 助手。它被评为基于 7B 参数级别 LLM 的最强大的视觉语言模型，在 13 个基准测试中处于领先地位。

InternLM-XComposer2：进一步的指令调整了 VLLM，用于具有自由格式输入的交错文本图像合成（图文创作）。

So if I want to obtain a textual description of the image (i.e. perform image captioning tasks), then I should use the InternLM-XComposer2-VL model, right? （所以如果我想要获得图像的文本描述（即执行image caption 任务），那么我就该InternLM-XComposer2-VL模型，是的吗？）

LightDXY commented 6 months ago

是的，InternLM-XComposer2-VL更适合这个任务

guikunchen commented 6 months ago

@LightDXY @panzhang0212 请问从InternLM-XComposer2-VL得到InternLM-XComposer2的训练应该怎么做呢？是采用https://github.com/InternLM/InternLM-XComposer/blob/main/finetune 这里的 code吗？如果是的话，1. 指定 pretrained path 为 VL 版本的 path。2. image size 设置成 224 还是 490？不太确定为什么两个版本要特地区分 image size。感谢🙏