containers / ai-lab-recipes

Examples for building and running LLM services and applications locally with Podman
Apache License 2.0
111 stars 110 forks source link

[WIP] Add a model_server example podman-llm #649

Open ericcurtin opened 4 months ago

ericcurtin commented 4 months ago

This is a tool that was written to be as simple as ollama, in it's simplest form it's:

podman-llm run granite

rhatdan commented 4 months ago

Does this tool have an upstream? Where is the REPO?

Not sure I love the name.

ericcurtin commented 4 months ago

Yeah, happy to rename, just needed to name it something:

https://github.com/ericcurtin/podman-llm

Could be llmc, llm-container, llm-oci, podllm? I really don't mind

It requires a couple of small patches to llama.cpp also, but nothing major:

https://github.com/ericcurtin/llama.cpp/tree/podman-llm

rhatdan commented 4 months ago

@sallyom @Gregory-Pereira @MichaelClifford @cooktheryan PTAL

ericcurtin commented 4 months ago

I was working with Ollama, but I worry about the long-term future there as regards external contributions:

https://github.com/ollama/ollama/pulls/ericcurtin

I fixed a lot of issues around OSTree-based OSes, podman support, Fedora support in general... But I just don't think Ollama folk are genuinely interested in external contributions (they weren't complex reviews).

So then I removed the middle component Ollama itself, since Ollama is a llama.cpp wrapper. So this uses llama.cpp direct pretty much, it kinda shows that the Ollama layer actually isn't doing a whole pile.

What I really liked about Ollama is it simplified running LLMs to:

ollama run mistral

so that's what I was going for here. I think creating an Ollama clone that's built directly against llama.cpp library could do very well.

And this is daemon-less (unlike Ollama), no client, servers, etc. unless you want to serve, it's zippier as a result.

ericcurtin commented 4 months ago

This review in particular was super easy and would give rpm-ostree/bootc OS support:

https://github.com/ollama/ollama/pull/3615/files

ericcurtin commented 4 months ago

There's obvious overlap with instructlab...

This is like containerized, daemonless, simplified instructlab for dummies, kinda like Ollama

ericcurtin commented 4 months ago

If we felt this idea was worth pursing there would probably be plenty of breaking changes to go. Some ideas we were thinking about GGUF's would be delivered as single file "FROM scratch" images (in it's own gguf container store, to be used with podman-llm:41, podman-llm-amd:41 or podman-llm-nvidia:41 container images).

So every "podman-llm run/serve" invocation is made up of some container image runtime (AMD, Nvidia, CPU, etc.) and a .gguf file which is delivered as a separate container image or downloaded from hugging face.

It's like Ollama with no custom "Modelfile" syntax (I think standard containerfiles are better) and no special OCI format, a .gguf is just a .gguf from a container image (or hugging face direct)

Some name change like @rhatdan is proposing to whatever people thinks sounds cool :)

But this is just a 20% project for me so would like to get people's opinions on if something like this is worthwhile, etc.

Gregory-Pereira commented 4 months ago

Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around bootc support and in my eyes this can be considered as much or more of a repo showcasing bootc, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in the model_servers dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.

ericcurtin commented 4 months ago

Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around bootc support and in my eyes this can be considered as much or more of a repo showcasing bootc, as it does AI recipes. This coupled the slow adoption in Ollama reinforces that this should live somewhere within the containers org if it gets accepted. Im in favor of pushing this through and keeping it to solely documentation in the model_servers dir for now. I suggest you also get another owner's buy in though because this is not so much about the docs passing or not, its about if we want to make the choice to open the door to this type of contribution.

bootc is pretty useful for AI use-cases, even for having the Nvidia dependencies pre-installed alone which are not always trivial to install in a deployment.

podman-llm (to be renamed) would work within a bootc image, or a non-bootc image for that matter. The only real dependency it has is that podman (or docker) is installed.

ericcurtin commented 4 months ago

@tumido 's feedback could also be interesting, looking at upcoming devconf.us talks, he is speaking about:

"Store AI/ML models efficiently with OCI Artifacts"

which is one of the things I am trying to do here, maybe we can combine efforts :)

I played around with a couple of ideas with different pros/cons, podman volumes, FROM scratch images, just simple container image inheritance. Right now it's a bind mounted directory ($HOME/.cache/huggingface/) to share .gguf's between multiple images. @tumido I bet has some interesting ideas :)

ericcurtin commented 4 months ago

Updated README.md diagram to highlight the value of pulling different runtimes:

+--------------------+
| Pull runtime layer |
| for llama.cpp      |
| (CPU, Vulkan, AMD, |
|  Nvidia, Intel or  |
|  Apple Silicon)    |
+--------------------+