OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
11.82k stars 829 forks source link

[BUG] <title>trying to load int 4 ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`. #379

Closed thistleknot closed 1 month ago

thistleknot commented 1 month ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

using boilerplate code in readme doesn't work for gguf nor int4 quantized models

期望行为 | Expected Behavior

model loads using the boilerplate code for all models listed on the readme page (quantized or not).

复现方法 | Steps To Reproduce

from chat import MiniCPMVChat, img2base64 import torch import json

torch.manual_seed(0)

chat_model = MiniCPMVChat('./models/MiniCPM-Llama3-V-2_5-int4')

im_64 = img2base64('./images/scryfall/Black Market Connections (Evyn Fong) [CLB] /{669/}.jpg')

First round chat

msgs = [{"role": "user", "content": "Tell me about the objects in this image."}]

inputs = {"image": im_64, "question": json.dumps(msgs)} answer = chat_model.chat(inputs) print(answer)

Second round chat

pass history context of multi-turn conversation

msgs.append({"role": "assistant", "content": answer}) msgs.append({"role": "user", "content": "Introduce something about the object(s) identified."})

inputs = {"image": im_64, "question": json.dumps(msgs)} answer = chat_model.chat(inputs) print(answer)

运行环境 | Environment

- OS: rocky linux 9
- Python: 3.10
- Transformers: idk whatever was installed w reqs
- PyTorch: again reqs
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.2

备注 | Anything else?

would appreciate boilerplate code for loading quantized models

thistleknot commented 1 month ago

found for gguf https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv

but after following

./llama.cpp/minicpmv-cli -m ./models/minicpm-2b-dpo-fp32.Q6_K.gguf --mmproj ./models/mmproj-model-f16.gguf -c 512 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image ./images/scryfall/test_image.jpg -p "What is in the image?"

llama_model_load: error loading model: check_tensor_dims: tensor 'output.weight' not found llama_load_model_from_file: failed to load model llava_init: error: unable to load model minicpmv_init: error: failed to init minicpmv model

What is in the image? Segmentation fault (core dumped)
thistleknot commented 1 month ago

nevermind

while I'm not sure how the int4 is used, adequate instructions to get me going

./llama.cpp/minicpmv-cli -m ./models/ggml-model-Q4_K_M.gguf --mmproj ./models/mmproj-model-f16.gguf -c 512 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image ./images/scryfall/test_image.jpg -p "What is in the image?"