Feature Request: Add Paligemma support

ggerganov / llama.cpp

LLM inference in C/C++

MIT License

68.35k stars 9.81k forks source link

Feature Request: Add Paligemma support #7875

Closed nischalj10 closed 1 month ago

nischalj10 commented 5 months ago

Prerequisites

[X] I am running the latest code. Mention the version if possible as well.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Its a really solid model and has a lot of requests in the discussions

Motivation

Pulls way above its weight and has really good ocr capabilites

Possible Implementation

No response

iamlemec commented 5 months ago

Yup! Work in progress at #7553.

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

TalonBvV commented 2 months ago

Hey there, I hope all is well, any luck getting the model to run?

I'm quite curious, I managed to use Clip.cpp quantization to quantize phi-3 vision's projector and Llama.cpp quantization to quantize the language model component, the result was a pretty useful VLM with a total size of 1.5GB. While this is great, because it proves that VLM models can be smaller and still accurate, but in terms of overall functionality Paligemma is a much better choice, it has baked in bounding box abilities which on it's own has limitless possibilities.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 14 days since being marked as stale.