Closed nischalj10 closed 1 month ago
Yup! Work in progress at #7553.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Hey there, I hope all is well, any luck getting the model to run?
I'm quite curious, I managed to use Clip.cpp quantization to quantize phi-3 vision's projector and Llama.cpp quantization to quantize the language model component, the result was a pretty useful VLM with a total size of 1.5GB. While this is great, because it proves that VLM models can be smaller and still accurate, but in terms of overall functionality Paligemma is a much better choice, it has baked in bounding box abilities which on it's own has limitless possibilities.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Prerequisites
Feature Description
Its a really solid model and has a lot of requests in the discussions
Motivation
Pulls way above its weight and has really good ocr capabilites
Possible Implementation
No response