Support for MiniCPM-Llama3-V 2.5 / OCR Text Extraction

abetlen / llama-cpp-python

Python bindings for llama.cpp

MIT License

8.22k stars 980 forks source link

A few weeks ago I did a survey of local image input models while adding image capabilities to Sibila and checked this model, but it seems to depend on an forked llama.cpp, so it won't run with current llama-cpp-python.

I tried these local models: https://jndiogo.github.io/sibila/models/vision/#local-models

They're okay for describing images and such, but for OCR they're not on the level of GPT-4o.

See this approach for a possibly safer option: separate the OCR from the model's interpretation of text: https://blog.gopenai.com/open-source-document-extraction-using-mistral-7b-llm-18bf437ca1d2

abetlen / llama-cpp-python

Support for MiniCPM-Llama3-V 2.5 / OCR Text Extraction #1532