InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.54k stars 156 forks source link

example_chat.py does not support multi-images as input? #212

Open tiesanguaixia opened 8 months ago

tiesanguaixia commented 8 months ago

Thank you for the favorable work! Inference on Multiple GPUs in README calls example_chat.py, but it seems like the code does not support multi-images as input. When I organize 2 images like Data preparation in finetune guidance, an error occurs:

File "/playground/InternLM-XComposer/examples/example_chat.py", line 38, in <module>
    response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
  File "/root/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm/internlm-xcomposer2-vl-7b/a52d70f582fa5773dd7b297f3e1a4caf149dcf59/modeling_internlm_xcomposer2.py", line 500, in chat
    image = self.encode_img(image)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm/internlm-xcomposer2-vl-7b/a52d70f582fa5773dd7b297f3e1a4caf149dcf59/modeling_internlm_xcomposer2.py", line 116, in encode_img
    assert isinstance(image, torch.Tensor)
AssertionError

So if the finetuning following guidance is finished (each sample in JSON file consists of 1 or multiple images), how to evalute the performance on a new dataset in which each sample consists of 1 or multiple images as well? Thank you in advance!

zhuchenxi commented 3 months ago

same question