Closed thistleknot closed 1 month ago
found for gguf https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv
but after following
./llama.cpp/minicpmv-cli -m ./models/minicpm-2b-dpo-fp32.Q6_K.gguf --mmproj ./models/mmproj-model-f16.gguf -c 512 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image ./images/scryfall/test_image.jpg -p "What is in the image?"
llama_model_load: error loading model: check_tensor_dims: tensor 'output.weight' not found llama_load_model_from_file: failed to load model llava_init: error: unable to load model minicpmv_init: error: failed to init minicpmv model
nevermind
while I'm not sure how the int4 is used, adequate instructions to get me going
./llama.cpp/minicpmv-cli -m ./models/ggml-model-Q4_K_M.gguf --mmproj ./models/mmproj-model-f16.gguf -c 512 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image ./images/scryfall/test_image.jpg -p "What is in the image?"
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
using boilerplate code in readme doesn't work for gguf nor int4 quantized models
期望行为 | Expected Behavior
model loads using the boilerplate code for all models listed on the readme page (quantized or not).
复现方法 | Steps To Reproduce
from chat import MiniCPMVChat, img2base64 import torch import json
torch.manual_seed(0)
chat_model = MiniCPMVChat('./models/MiniCPM-Llama3-V-2_5-int4')
im_64 = img2base64('./images/scryfall/Black Market Connections (Evyn Fong) [CLB] /{669/}.jpg')
First round chat
msgs = [{"role": "user", "content": "Tell me about the objects in this image."}]
inputs = {"image": im_64, "question": json.dumps(msgs)} answer = chat_model.chat(inputs) print(answer)
Second round chat
pass history context of multi-turn conversation
msgs.append({"role": "assistant", "content": answer}) msgs.append({"role": "user", "content": "Introduce something about the object(s) identified."})
inputs = {"image": im_64, "question": json.dumps(msgs)} answer = chat_model.chat(inputs) print(answer)
运行环境 | Environment
备注 | Anything else?
would appreciate boilerplate code for loading quantized models