abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.22k stars 980 forks source link

Support for MiniCPM-Llama3-V 2.5 / OCR Text Extraction #1532

Open ammyt opened 5 months ago

ammyt commented 5 months ago

Hello, there are models available for MiniCPM-Llama3-V 2.5: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/blob/main/mmproj-model-f16.gguf https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/blob/main/ggml-model-Q8_0.gguf

Has anyone tried running these models / are they supported?

The reason I ask is because the text extraction / OCR capabilities look very promising: https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5

Otherwise I would love to hear about your successes with OCR :-)

jndiogo commented 5 months ago

A few weeks ago I did a survey of local image input models while adding image capabilities to Sibila and checked this model, but it seems to depend on an forked llama.cpp, so it won't run with current llama-cpp-python.

I tried these local models: https://jndiogo.github.io/sibila/models/vision/#local-models

They're okay for describing images and such, but for OCR they're not on the level of GPT-4o.

See this approach for a possibly safer option: separate the OCR from the model's interpretation of text: https://blog.gopenai.com/open-source-document-extraction-using-mistral-7b-llm-18bf437ca1d2