Support llama.cpp as alternative backend

kerthcet commented 1 month ago

What would you like to be added:

llama.cpp supports running inference on cpus this is useful for uses has no GPU accelerators, in fact, also helpful to llmaz as we have no GPU servers right now.

What's more, llama.cpp also supports multi-host inference, see https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc

Completion requirements:

This enhancement requires the following artifacts:

[ ] Design doc
[ ] API change
[ ] Docs update

The artifacts should be linked in subsequent comments.

kerthcet commented 1 month ago

/kind feature

kerthcet commented 3 weeks ago

/milestone v0.1.0

kerthcet commented 3 weeks ago

/assign

Let's support this so we can run e2e tests on cpu machines.

InftyAI / llmaz

Support llama.cpp as alternative backend #65