InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
1.92k stars 121 forks source link

How to do VQA task by image url with <internlm-xcomposer2-4khd-7b>? #255

Closed XKCUW closed 2 months ago

XKCUW commented 2 months ago

I can use "vis_processor" method to get the embedding of the image manually(through url) on , like the method used in . (refer: https://huggingface.co/internlm/internlm-xcomposer2-7b). However, I tried the same way on , it doesn't work. could you give some examples to solve the issue? @panzhang0212 @yhcao6

LightDXY commented 2 months ago

https://github.com/open-compass/VLMEvalKit/blob/main/vlmeval/vlm/xcomposer/xcomposer2_4KHD.py#L60 you could define the text and image, and the image resolution freely with this function, this repo also supports many mainstream VQA benchmarks.