Open ericcurtin opened 4 months ago
Does this tool have an upstream? Where is the REPO?
Not sure I love the name.
Yeah, happy to rename, just needed to name it something:
https://github.com/ericcurtin/podman-llm
Could be llmc, llm-container, llm-oci, podllm? I really don't mind
It requires a couple of small patches to llama.cpp also, but nothing major:
@sallyom @Gregory-Pereira @MichaelClifford @cooktheryan PTAL
I was working with Ollama, but I worry about the long-term future there as regards external contributions:
https://github.com/ollama/ollama/pulls/ericcurtin
I fixed a lot of issues around OSTree-based OSes, podman support, Fedora support in general... But I just don't think Ollama folk are genuinely interested in external contributions (they weren't complex reviews).
So then I removed the middle component Ollama itself, since Ollama is a llama.cpp wrapper. So this uses llama.cpp direct pretty much, it kinda shows that the Ollama layer actually isn't doing a whole pile.
What I really liked about Ollama is it simplified running LLMs to:
ollama run mistral
so that's what I was going for here. I think creating an Ollama clone that's built directly against llama.cpp library could do very well.
And this is daemon-less (unlike Ollama), no client, servers, etc. unless you want to serve, it's zippier as a result.
This review in particular was super easy and would give rpm-ostree/bootc OS support:
There's obvious overlap with instructlab...
This is like containerized, daemonless, simplified instructlab for dummies, kinda like Ollama
If we felt this idea was worth pursing there would probably be plenty of breaking changes to go. Some ideas we were thinking about GGUF's would be delivered as single file "FROM scratch" images (in it's own gguf container store, to be used with podman-llm:41, podman-llm-amd:41 or podman-llm-nvidia:41 container images).
So every "podman-llm run/serve" invocation is made up of some container image runtime (AMD, Nvidia, CPU, etc.) and a .gguf file which is delivered as a separate container image or downloaded from hugging face.
It's like Ollama with no custom "Modelfile" syntax (I think standard containerfiles are better) and no special OCI format, a .gguf is just a .gguf from a container image (or hugging face direct)
Some name change like @rhatdan is proposing to whatever people thinks sounds cool :)
But this is just a 20% project for me so would like to get people's opinions on if something like this is worthwhile, etc.
Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around bootc
support and in my eyes this can be considered as much or more of a repo showcasing bootc
, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in the model_servers
dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.
Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around
bootc
support and in my eyes this can be considered as much or more of a repo showcasingbootc
, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in themodel_servers
dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.
bootc is pretty useful for AI use-cases, even for having the Nvidia dependencies pre-installed alone which are not always trivial to install in a deployment.
podman-llm (to be renamed) would work within a bootc image, or a non-bootc image for that matter. The only real dependency it has is that podman (or docker) is installed.
@tumido 's feedback could also be interesting, looking at upcoming devconf.us talks, he is speaking about:
"Store AI/ML models efficiently with OCI Artifacts"
which is one of the things I am trying to do here, maybe we can combine efforts :)
I played around with a couple of ideas with different pros/cons, podman volumes, FROM scratch images, just simple container image inheritance. Right now it's a bind mounted directory ($HOME/.cache/huggingface/) to share .gguf's between multiple images. @tumido I bet has some interesting ideas :)
Updated README.md diagram to highlight the value of pulling different runtimes:
+--------------------+
| Pull runtime layer |
| for llama.cpp |
| (CPU, Vulkan, AMD, |
| Nvidia, Intel or |
| Apple Silicon) |
+--------------------+
This is a tool that was written to be as simple as ollama, in it's simplest form it's:
podman-llm run granite