Open ammyt opened 5 months ago
A few weeks ago I did a survey of local image input models while adding image capabilities to Sibila and checked this model, but it seems to depend on an forked llama.cpp, so it won't run with current llama-cpp-python.
I tried these local models: https://jndiogo.github.io/sibila/models/vision/#local-models
They're okay for describing images and such, but for OCR they're not on the level of GPT-4o.
See this approach for a possibly safer option: separate the OCR from the model's interpretation of text: https://blog.gopenai.com/open-source-document-extraction-using-mistral-7b-llm-18bf437ca1d2
Hello, there are models available for MiniCPM-Llama3-V 2.5: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/blob/main/mmproj-model-f16.gguf https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/blob/main/ggml-model-Q8_0.gguf
Has anyone tried running these models / are they supported?
The reason I ask is because the text extraction / OCR capabilities look very promising: https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5
Otherwise I would love to hear about your successes with OCR :-)