OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.86k stars 547 forks source link

[BUG] <title>多卡部署OmniLMM12B给出所有的数据必须在相同的device上报错 #240

Closed SKY072410 closed 4 weeks ago

SKY072410 commented 1 month ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

在两张16G的3080显卡上按照官方给出的多卡部署MiniCPM-Llama3-V能够成功部署并进行推理,但是多卡部署OmniLmm12B时虽然像给出的指示设置了 device_map["model.embed_tokens"] = 0 device_map["model.layers.0"] = 0 device_map["model.layers.31"] = 0 device_map["model.norm"] = 0 device_map["model.resampler"] = 0 device_map["model.vision_tower"] = 0 device_map["lm_head"] = 0 保证输入输出在同一张显卡,但仍然给出数据不在一张显卡的报错。Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

期望行为 | Expected Behavior

请问怎么解决这个问题?

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

iceflame89 commented 4 weeks ago

chat.py#L30-L31 if False 改成if True 即可支持OmniLMM12B多卡推理

SKY072410 commented 3 weeks ago

直接将False改成True任然报错,好像还是不在同一个设备上。RuntimeError: Tensor on device cuda:0 is not on the expected device meta!