OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.82k stars 543 forks source link

[BUG] 双卡运行device='auto'后NotImplementedError: Cannot copy out of meta tensor; no data!” #292

Closed bailove closed 1 week ago

bailove commented 1 week ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

采用双卡运行demo 推理出现“NotImplementedError: Cannot copy out of meta tensor; no data!”。 我该如何处理这个错误。谢谢

<User>: hello

Cannot copy out of meta tensor; no data!

Traceback (most recent call last):

File "/home/mm39/ai/MiniCPM-V/web_demo_2.5.py", line 149, in chat

answer = model.chat(

File "/home/mm39/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/45387f99a455e11801b78a0b24811856688e0c8b/modeling_minicpmv.py", line 454, in chat

res, vision_hidden_states = self.generate(

File "/home/mm39/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/45387f99a455e11801b78a0b24811856688e0c8b/modeling_minicpmv.py", line 354, in generate

) = self.get_vllm_embedding(model_inputs)

File "/home/mm39/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/45387f99a455e11801b78a0b24811856688e0c8b/modeling_minicpmv.py", line 99, in get_vllm_embedding

vision_embedding = self.vpm(all_pixel_values.type(dtype), patch_attention_mask=patch_attn_mask).last_hidden_state

File "/home/mm39/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "/home/mm39/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl

return forward_call(*args, **kwargs)

File "/home/mm39/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 161, in new_forward

args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)

File "/home/mm39/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 356, in pre_forward

return send_to_device(args, self.execution_device), send_to_device(

File "/home/mm39/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 186, in send_to_device

{

File "/home/mm39/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 187, in <dictcomp>

k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)

File "/home/mm39/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 158, in send_to_device

return tensor.to(device, non_blocking=non_blocking)

NotImplementedError: Cannot copy out of meta tensor; no data!

<Assistant>: Error, please retry

期望行为 | Expected Behavior

期望返回正常解析行为

复现方法 | Steps To Reproduce

python3 web_demo_2.5.py --device auto


---------------------------------------------------------------------

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

parser = argparse.ArgumentParser(description='demo')

parser.add_argument('--device', type=str, default='cuda', help='cuda or mps auto')

args = parser.parse_args()

device = args.device

assert device in ['cuda', 'mps', 'auto']

# Load model

model_path = 'openbmb/MiniCPM-Llama3-V-2_5'

#model_path = 'scomper/minicpm-v2.5'

if 'int4' in model_path:

if device == 'mps':

print('Error: running int4 model with bitsandbytes on Mac is not supported right now.')

exit()

model = AutoModel.from_pretrained(model_path, trust_remote_code=True)

else:

model = AutoModel.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.float16, device_map=device)

#model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(dtype=torch.float16)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model.eval()

运行环境 | Environment

- OS:ubuntu22.04
- Python: 3.10.12
- Transformers:4.40.0
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

bailove commented 1 week ago

添加 empty_init=False 也是一样

Cuiunbo commented 1 week ago

请看https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md