OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.13k stars 848 forks source link

[BUG] <title>Input Error: Only 4D input Tensors are supported (got 3D) #359

Closed northkd closed 1 month ago

northkd commented 2 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

我运行多轮对话的测试代码之后 from chat import MiniCPMVChat, img2base64 import torch import json

torch.manual_seed(0)

chat_model = MiniCPMVChat('openbmb/MiniCPM-Llama3-V-2_5')

im_64 = img2base64('./assets/airplane.jpeg')

msgs = [{"role": "user", "content": "Tell me the model of this aircraft."}]

inputs = {"image": im_64, "question": json.dumps(msgs)} answer = chat_model.chat(inputs) print(answer)

msgs.append({"role": "assistant", "content": answer}) msgs.append({"role": "user", "content": "Introduce something about Airbus A380."})

inputs = {"image": im_64, "question": json.dumps(msgs)} answer = chat_model.chat(inputs) print(answer)

出现了下面的错误

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 7/7 [00:06<00:00, 1.11it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/data/wzy/LLM/MiniCPM-V-main/test.py", line 54, in answer = chat_model.chat(inputs) File "/data/wzy/LLM/MiniCPM-V-main/chat.py", line 197, in chat return self.model.chat(input) File "/data/wzy/LLM/MiniCPM-V-main/chat.py", line 177, in chat answer = self.model.chat( File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 412, in chat images.append(self.reshape_by_patch(slice_image)) File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 304, in reshape_by_patch patches = torch.nn.functional.unfold( File "/root/anaconda3/envs/CV-LLM/lib/python3.10/site-packages/torch/nn/functional.py", line 4666, in unfold raise NotImplementedError("Input Error: Only 4D input Tensors are supported (got {}D)".format(input.dim())) NotImplementedError: Input Error: Only 4D input Tensors are supported (got 3D)

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

LDLINGLINGLING commented 2 months ago

你可以看下你输入的im_64的shape,用于判断是否图片读取除了问题

northkd commented 2 months ago

你可以看下你输入的im_64的形状,用于判断是否图片读取除了问题

我使用的就是README中默认的多轮对话的代码。如果作者可以正常运行的话应该是不会出问题的吧

Cuiunbo commented 1 month ago

您好,可以尝试使用我们最新的教程https://huggingface.co/openbmb/MiniCPM-V-2_6#chat-with-multiple-images