Open p5 opened 1 month ago
We had intended on merging vllm support soon, we started it here:
https://github.com/containers/ramalama/pull/97
this is what we think an outline of what it should look like, basically we want to introduce a --runtime flag, kinda like like the podman one that switches between crun, runc, krun, but in this case allows one to switch between llama.cpp, vllm, and whatever other runtimes people would like to integrate in future.
Above is a key feature we want, it's one of the reasons we don't simply use Ollama.
Now that we have a vllm v0.6.1 , we are ready to complete that work:
Vision models like this would be useful for sure.
Personally I'm gonna be out a little bit in the next week or two, have a wedding and other things I need to take some time for.
Anybody who wants to pick up --runtime, vllm support, vision model support, like you @p5 or others, be my guest.
@rhatdan merged the first vllm-related PR, I dunno if you want to take a stab at implementing the other things you had in mind @p5
@p5 still interested in this?
Hey Dan, Eric
My free time is very limited at the minute. Starting a new job in 2 weeks and there's a lot to get in order.
I still feel vision models would be a great addition to ramalama, but I'm going to be in a Windows-only environment :sigh: so unsure how much I'll be able to help out.
Thanks @p5, good luck with the new job.
Best of luck @p5 @bmahabirbu did have success running on Windows recently:
https://github.com/containers/ramalama/tree/main/docs/readme
FYI - Ollama is now implementing vision models, so once v0.4 is released, it might be easier to integrate here.
FYI - Ollama is now implementing vision models, so once v0.4 is released, it might be easier to integrate here.
Indirectly maybe, we inherit from the same backend llama.cpp, we don't actually use any Ollama stuff directly even though to a user it might appear that way!
Oh, apologies. I thought Ramalama used both llama.cpp and ollama runtimes 🤦 Now I can see you use Ollama's registry and transport, served via llama.cpp runtime.
And we wrote the Ollama transport from scratch, so we use zero Ollama code.
What a lot of people don't realize is it's llama.cpp that does most of the heavy lifting for Ollama.
Value Statement
As someone who wants a boring way to use AI I would like to expose an image/PDF/document to the LLM So that I can make requests and extract information, all within Ramalama
Notes
Various models now contain vision functionality, where they can ingest data from images, and answer questions about those images. Recently, the accuracy of these LLM-based OCR text extractions can exceed that of dedicated OCR tooling (even paid products like AWS Textract). The same vision models can also be used to extract information from PDF documents fairly easily after converting them to images.
We can use a similar interface to the planned Whisper.cpp implementation, since both are just contexts or data we provide to the LLMs. This has not been detailed anywhere, so below is a proposal/example of how it could look.
The primary issue is neither ollama or llama.cpp support vision models at this moment, so would either need a custom implementation, or would require adding something like vllm.