InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.4k stars 147 forks source link

运行demo报错 #358

Open yusirhhh opened 1 month ago

yusirhhh commented 1 month ago

python others/test_diff_vlm/InternLM_XComposer.py Set max length to 16384 Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.47s/it]Traceback (most recent call last): File "/mnt/data/mmyu/eqa/explore-eqa/others/test_diff_vlm/InternLMXComposer.py", line 19, in response, = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True) File "/home/mmyu/anaconda3/envs/eval_cog/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/mmyu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b/modeling_internlm_xcomposer2.py", line 594, in chat inputs, immask, = self.interleav_wrap_chat(query, image, history=history, meta_instruction=meta_instruction, hd_num=hd_num) File "/home/mmyu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b/modeling_internlm_xcomposer2.py", line 273, in interleav_wrap_chat img = self.encode_img(image[idx], hd_num) File "/home/mmyu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b/modeling_internlm_xcomposer2.py", line 164, in encode_img image = Image_transform(image, hd_num = hd_num) File "/home/mmyu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b/ixc_utils.py", line 46, in Image_transform img = padding_336(img, 560) File "/home/mmyu/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2d5-7b/ixc_utils.py", line 24, in padding_336 b = transforms.functional.pad(b, [left_padding, top_padding, right_padding, bottom_padding], fill=[255,255,255]) File "/home/mmyu/anaconda3/envs/eval_cog/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 516, in pad return F_pil.pad(img, padding=padding, fill=fill, padding_mode=padding_mode) File "/home/mmyu/anaconda3/envs/eval_cog/lib/python3.9/site-packages/torchvision/transforms/_functional_pil.py", line 175, in pad opts = _parse_fill(fill, img, name="fill") File "/home/mmyu/anaconda3/envs/eval_cog/lib/python3.9/site-packages/torchvision/transforms/_functional_pil.py", line 271, in _parse_fill raise ValueError(msg.format(len(fill), num_channels)) ValueError: The number of elements in 'fill' does not match the number of channels of the image (3 != 4)

simplelifetime commented 1 month ago

Same problem

MarcoFerreiraPerson commented 1 month ago

same problem

zTaoplus commented 1 month ago

Same issue here , but I was inspired by this Then I changed the channel of my file to RGB using the following code and finally ran example_chat.py successfully

from PIL import Image
img = Image.open('test.png')
img = img.convert("RGB")
img.save("test-rgb.png")

Hope this helps.