Closed rhatdan closed 4 weeks ago
This is a redesign of run and serve to not run RamaLama within the container but only the AI Model Runtime llama.cpp or vllm.
This should simplify the operation, although potentially causes us risk with versions of Python on the host.
This is a redesign of run and serve to not run RamaLama within the container but only the AI Model Runtime llama.cpp or vllm.
This should simplify the operation, although potentially causes us risk with versions of Python on the host.