[BUG] 跑多轮对话demo遇到的bug

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

No response

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

readme里的多轮对话demo


from chat import MiniCPMVChat, img2base64
import torch
import json

torch.manual_seed(0)

chat_model = MiniCPMVChat('openbmb/MiniCPM-Llama3-V-2_5')

im_64 = img2base64('./assets/airplane.jpeg')

# First round chat 
msgs = [{"role": "user", "content": "Tell me the model of this aircraft."}]

inputs = {"image": im_64, "question": json.dumps(msgs)}
answer = chat_model.chat(inputs) #这一步出错
print(answer)

# Second round chat 
# pass history context of multi-turn conversation
msgs.append({"role": "assistant", "content": answer})
msgs.append({"role": "user", "content": "Introduce something about Airbus A380."})

inputs = {"image": im_64, "question": json.dumps(msgs)}
answer = chat_model.chat(inputs)
print(answer)

运行环境 | Environment

- OS: linux
- Python: 3.9
- Transformers: 4.41.0
- PyTorch: 1.12.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 10.2

备注 | Anything else?

NotImplementedError                       Traceback (most recent call last)
Cell In[4], line 4
      1 msgs = [{"role": "user", "content": "图片是什么内容."}]
      3 inputs = {"image": im_64, "question": json.dumps(msgs)}
----> 4 answer = chat_model.chat(inputs)
      5 print(answer)

File ~/work/MiniCPM-V-main/chat.py:197, in MiniCPMVChat.chat(self, input)
    196 def chat(self, input):
--> 197     return self.model.chat(input)

File ~/work/MiniCPM-V-main/chat.py:177, in MiniCPMV2_5.chat(self, input)
    173         return "Image decode error"
    175     msgs = json.loads(input['question'])
--> 177     answer = self.model.chat(
    178         image=image,
    179         msgs=msgs,
    180         tokenizer=self.tokenizer,
    181         sampling=True,
    182         temperature=0.7
    183     )
    184     return answer

File ~/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py:416, in MiniCPMV.chat(self, image, msgs, tokenizer, vision_hidden_states, max_new_tokens, sampling, max_inp_length, system_prompt, stream, **kwargs)
    414         slice_image = self.transform(slice_image)
    415         H, W = slice_image.shape[1:]
--> 416         images.append(self.reshape_by_patch(slice_image))
    417         tgt_sizes.append(torch.Tensor([H // self.config.patch_size, W // self.config.patch_size]).type(torch.int32))
    418 else:

File ~/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py:308, in MiniCPMV.reshape_by_patch(self, image_tensor)
    303 patch_size = self.config.patch_size
    304 #print(image_tensor)
    305 #print(image_tensor.shape)
    306 #image_tensor = image_tensor.reshape(1,*image_tensor.shape)
    307 #patches = image_tensor
--> 308 patches = torch.nn.functional.unfold(
    309     image_tensor,
    310     (patch_size, patch_size),
    311     stride=(patch_size, patch_size)
    312 )
    314 patches = patches.reshape(image_tensor.size(0), patch_size, patch_size, -1)
    315 patches = patches.permute(0, 1, 3, 2).reshape(image_tensor.size(0), patch_size, -1)

File ~/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/functional.py:4666, in unfold(input, kernel_size, dilation, padding, stride)
   4664     return torch._C._nn.im2col(input, _pair(kernel_size), _pair(dilation), _pair(padding), _pair(stride))
   4665 else:
-> 4666     raise NotImplementedError("Input Error: Only 4D input Tensors are supported (got {}D)".format(input.dim()))

NotImplementedError: Input Error: Only 4D input Tensors are supported (got 3D)

OpenBMB / MiniCPM-V

[BUG] 跑多轮对话demo遇到的bug #264

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?