getumbrel / llama-gpt

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
https://apps.umbrel.com/app/llama-gpt
MIT License
10.53k stars 666 forks source link

CUDA Support for Kubernetes #143

Open AndreasMurk opened 5 months ago

AndreasMurk commented 5 months ago

I have seen that once the --with-cuda flag is provided, the cuda-ggml Image is build using the context in the docker-compose file.

It would be nice to also support CUDA when deploying with Kubernetes. If there is support or a way to deploy the Pods consuming GPUs, I couldn't find it in the README

From a quick look, the following steps would be required:

  1. Make CUDA image publicly available through https://ghcr.io
  2. Create CUDA Service manifest / or set the container image to the cuda image once a flag is provided
  3. Add resource: limits: nvidia/gpu: 1 to the container section in the UI deployment

If you could publish the image, I could create a PR and work on this.

Thank you