I have seen that once the --with-cuda flag is provided, the cuda-ggml Image is build using the context in the docker-compose file.
It would be nice to also support CUDA when deploying with Kubernetes. If there is support or a way to deploy the Pods consuming GPUs, I couldn't find it in the README
From a quick look, the following steps would be required:
I have seen that once the
--with-cuda
flag is provided, thecuda-ggml
Image is build using the context in the docker-compose file.It would be nice to also support CUDA when deploying with Kubernetes. If there is support or a way to deploy the Pods consuming GPUs, I couldn't find it in the README
From a quick look, the following steps would be required:
resource: limits: nvidia/gpu: 1
to the container section in the UI deploymentIf you could publish the image, I could create a PR and work on this.
Thank you