InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
1.92k stars 121 forks source link

Interleaved inputs for InternLM-XC-4KHD #266

Closed ys-zong closed 2 months ago

ys-zong commented 2 months ago

Hi, thanks for the nice work! I wonder if InternLM-4KHD supports interleaved image-text (e.g. multiple images) inputs for inference like InternLM-XComposer?

LightDXY commented 2 months ago

hi, the model has such capability but is not good at it, as we do not train the model with interleaved data

myownskyW7 commented 2 months ago

@ys-zong Hi, you may try to concatenate multiple images into a sinlge large image and ask the question.