[BUG] Mac MiniCPM-Llama3-V-2_5 推理出错

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

下面是我的代码，本地load MiniCPM-Llama3-V-2_5 model：

import os

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model_path = "/Volumes/models/MiniCPM-Llama3-V-2_5"
pic_path = "/Volumes/dataset/dashboard/2_transformed.png"

model = AutoModel.from_pretrained(model_path, trust_remote_code=True, low_cpu_mem_usage=True)
model = model.to(device="mps")
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()

image = Image.open(pic_path).convert('RGB')
question = '图片中的异常信息是什么'
msgs = [{'role': 'user', 'content': question}]

default_kwargs = dict(
            max_new_tokens=896,
            sampling=False,
            num_beams=3
        )

answer = model.chat(
    image=image,
    msgs=msgs,
    tokenizer=tokenizer,
    **default_kwargs
)
print(answer)

并且在PyCharm的运行配置中，已经设置PYTORCH_ENABLE_MPS_FALLBACK 为 1。

### Mac M2 ultra设备上执行上述代码，执行到model.chat时遇到以下error，请问需要如何解决：

Error: command buffer exited with error status.
    The Metal Performance Shaders operations encoded on it may not have completed.
    Error: 
    (null)
    Internal Error (0000010a:Internal Error)
    <AGXG14XFamilyCommandBuffer: 0x3164799f0>
    label = <none> 
    device = <AGXG14DDevice: 0x119535800>
        name = Apple M2 Ultra 
    commandQueue = <AGXG14XFamilyCommandQueue: 0x109723c00>
        label = <none> 
        device = <AGXG14DDevice: 0x119535800>
            name = Apple M2 Ultra 
    retainedReferences = 1
Traceback (most recent call last):
  File "/Volumes/M2SSD/Space/Workspace/llm/MiniCPM-V/app/recognize_pic.py", line 28, in <module>
    answer = model.chat(
             ^^^^^^^^^^^
  File "/Users/guangqu/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 454, in chat
    res, vision_hidden_states = self.generate(
                                ^^^^^^^^^^^^^^
  File "/Users/guangqu/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 354, in generate
    ) = self.get_vllm_embedding(model_inputs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangqu/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 99, in get_vllm_embedding
    vision_embedding = self.vpm(all_pixel_values.type(dtype), patch_attention_mask=patch_attn_mask).last_hidden_state
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangqu/anaconda3/envs/minicpm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangqu/anaconda3/envs/minicpm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangqu/anaconda3/envs/minicpm/lib/python3.11/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 720, in forward
    hidden_states = self.embeddings(pixel_values=pixel_values, patch_attention_mask=patch_attention_mask)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangqu/anaconda3/envs/minicpm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangqu/anaconda3/envs/minicpm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/guangqu/anaconda3/envs/minicpm/lib/python3.11/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 183, in forward
    fractional_coords_h = torch.arange(0, 1 - 1e-6, 1 / nb_patches_h)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: step must be nonzero

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: macOS 14.5
- Python: python3.11
- Transformers: 4.40.0
- PyTorch: 2.1.2
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): MPS

备注 | Anything else?

No response

OpenBMB / MiniCPM-V

[BUG] Mac MiniCPM-Llama3-V-2_5 推理出错 #294

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?