[BUG] <MiniCPM-Llama3-V-2.5无法在24G 4090上启动>

weiminw commented 2 weeks ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

按照你们在huggingface提供的sample，启动报错:

Loading checkpoint shards: 100%|██████████████████| 7/7 [00:02<00:00,  2.36it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/workspace/heliumos-bixi/heliumos-bixi-vision/tests/test_minicpmv.py", line 7, in <module>
    model.to(device='cuda')
  File "/workspace/heliumos-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2724, in to
    return super().to(*args, **kwargs)
  File "/workspace/heliumos-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/workspace/heliumos-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/workspace/heliumos-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/workspace/heliumos-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "/workspace/heliumos-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
  File "/workspace/heliumos-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU

期望行为 | Expected Behavior

期望可以在4090上使用。

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: ubuntu 22.04
- Python: 3.10.2
- Transformers: 4.41.2
- PyTorch:2.3.1
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

1SingleFeng commented 2 weeks ago

请问这是训练还是推理

weiminw commented 2 weeks ago

请问这是训练还是推理

推理

weiminw commented 2 weeks ago

使用的测试代码就是huggfingface 首页提供的。

model = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, torch_dtype=torch.float16)
model = model.to(device='cuda')

tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)
model.eval()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': question}]

res = model.chat(
    image=image,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True, # if sampling=False, beam_search will be used by default
    temperature=0.7,
    # system_prompt='' # pass system_prompt if needed
)
print(res)

1SingleFeng commented 2 weeks ago

使用的测试代码就是huggfingface 首页提供的。

model = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, torch_dtype=torch.float16)
model = model.to(device='cuda')

tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)
model.eval()

image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': question}]

res = model.chat(
    image=image,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True, # if sampling=False, beam_search will be used by default
    temperature=0.7,
    # system_prompt='' # pass system_prompt if needed
)
print(res)

那就不知道了，我也有台4090 24GB，还没测试过，如果后续测试有问题的话再联系你

1SingleFeng commented 2 weeks ago

odel = model.to(device='cuda')

你现在这个是显存不够报错，可以尝试将模型放在CPU上进行推理

weiminw commented 2 weeks ago

对，但是我看官方说法是 fp16 17-18G 显存，我这24G启动不了，比较奇怪。

1SingleFeng commented 2 weeks ago

对，但是我看官方说法是 fp16 17-18G 显存，我这24G启动不了，比较奇怪。

不确定你这边出现了什么问题，我这边使用官方给的示例代码https://github.com/OpenBMB/MiniCPM-V#multi-turn-conversation ，完全可以跑起来

Zhaojjjjjj commented 2 weeks ago

对，但是我看官方说法是 fp16 17-18G 显存，我这24G启动不了，比较奇怪。

不确定你这边出现了什么问题，我这边使用官方给的示例代码https://github.com/OpenBMB/MiniCPM-V#multi-turn-conversation ，完全可以跑起来

大佬，这个怎么解决...

1SingleFeng commented 2 weeks ago

对，但是我看官方说法是 fp16 17-18G 显存，我这24G启动不了，比较奇怪。

不确定你这边出现了什么问题，我这边使用官方给的示例代码https://github.com/OpenBMB/MiniCPM-V#multi-turn-conversation ，完全可以跑起来

大佬，这个怎么解决...

需要去huggingface下载minicpm-llama3-v-2_5的权重（下载地址 https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5 ），代码里指定权重路径，要不然会自动下载，但是看你的报错是本地网络不能连接huggingface，所以下载不了

weiminw commented 2 weeks ago

对，但是我看官方说法是 fp16 17-18G 显存，我这24G启动不了，比较奇怪。

不确定你这边出现了什么问题，我这边使用官方给的示例代码https://github.com/OpenBMB/MiniCPM-V#multi-turn-conversation ，完全可以跑起来

我对比了一下，加载model之后需要， model.to(dtype=torch.bfloat16) 这个是关键，加了这个24G 就可以启动了。

1SingleFeng commented 2 weeks ago

对，但是我看官方说法是 fp16 17-18G 显存，我这24G启动不了，比较奇怪。

不确定你这边出现了什么问题，我这边使用官方给的示例代码https://github.com/OpenBMB/MiniCPM-V#multi-turn-conversation ，完全可以跑起来

我对比了一下，加载model之后需要， model.to(dtype=torch.bfloat16) 这个是关键，加了这个24G 就可以启动了。

了解了，感谢~

weiminw commented 2 weeks ago

问题已解决

Zhaojjjjjj commented 2 weeks ago

对，但是我看官方说法是 fp16 17-18G 显存，我这24G启动不了，比较奇怪。

不确定你这边出现了什么问题，我这边使用官方给的示例代码https://github.com/OpenBMB/MiniCPM-V#multi-turn-conversation ，完全可以跑起来

大佬，这个怎么解决...

需要去huggingface下载minicpm-llama3-v-2_5的权重（下载地址 https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5 ），代码里指定权重路径，要不然会自动下载，但是看你的报错是本地网络不能连接huggingface，所以下载不了

Traceback (most recent call last): File "/Users/ss/web/LOCAL/MiniCPM-V-main/web_demo_2.5.py", line 34, in model = AutoModel.from_pretrained(model_path, trust_remote_code=True, torch_dtype=torch.float16, device_map=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3122, in from_pretrained raise ImportError( ImportError: Using low_cpu_mem_usage=True or a device_map requires Accelerate: pip install accelerate 已经 install accelerate，运行还是报这个，怎么回事呢...

OpenBMB / MiniCPM-V