是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
[X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
When input some images, the program will stop after clip_image_preprocess and clip_image_build_graph, and then exit.
The program should have started encode_image_with_clip after clip_image_build_graph but it doen't.
I guess it is because uhd_slice_image makes the image too small to encode by CLIP, or the image is too small.
However, the demo can run smoothly at these images.
I used ggml-model-Q4_K_M.gguf on llama.cpp
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
When input some images, the program will stop after clip_image_preprocess and clip_image_build_graph, and then exit. The program should have started encode_image_with_clip after clip_image_build_graph but it doen't. I guess it is because uhd_slice_image makes the image too small to encode by CLIP, or the image is too small. However, the demo can run smoothly at these images. I used ggml-model-Q4_K_M.gguf on llama.cpp
输入特定图片会令程序停在clip_image_preprocess 和 clip_image_build_graph之后,并退出。 正常情况下程序在clip_image_build_graph后会encode_image_with_clip即用CLIP编码图像,但此时并没有。 我推测这是由于uhd_slice_image将图片的尺寸调整得太小或者图片本身尺寸太小,导致CLIP无法编码。 但是在demo上图片可以被正确处理。 我在llama.cpp上使用了ggml-model-Q4_K_M.gguf这个量化模型。
期望行为 | Expected Behavior
The program should process the images.
程序应该能处理图片。
复现方法 | Steps To Reproduce
llama-minicpmv-cli.exe -m ggml-model-Q4_K_M.gguf --mmproj mmproj-model-f16.gguf -c 512 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image QQQQQQQ_8408.jpg -p "return the text"
运行环境 | Environment
备注 | Anything else?
clip_model_load: CLIP using CUDA backend clip_model_load: text_encoder: 0 clip_model_load: vision_encoder: 1 clip_model_load: llava_projector: 0 clip_model_load: minicpmv_projector: 1 clip_model_load: model size: 996.02 MB clip_model_load: metadata size: 0.16 MB clip_model_load: params backend buffer size = 996.02 MB (455 tensors) key clip.vision.image_grid_pinpoints not found in file key clip.vision.mm_patch_merge_type not found in file key clip.vision.image_crop_resolution not found in file clip_image_build_graph: 448 448 clip_model_load: compute allocated memory: 102.80 MB uhd_slice_image: multiple 1 clip_image_preprocess: 1050 196 clip_image_build_graph: 1050 196