InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes
Apache License 2.0
13 stars 5 forks source link

Support llama.cpp as alternative backend #65

Closed kerthcet closed 3 weeks ago

kerthcet commented 1 month ago

What would you like to be added:

llama.cpp supports running inference on cpus this is useful for uses has no GPU accelerators, in fact, also helpful to llmaz as we have no GPU servers right now.

What's more, llama.cpp also supports multi-host inference, see https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

kerthcet commented 1 month ago

/kind feature

kerthcet commented 3 weeks ago

/milestone v0.1.0

kerthcet commented 3 weeks ago

/assign

Let's support this so we can run e2e tests on cpu machines.