kubernetes example - Githubissues

phymbert commented 1 month ago

Motivation

Kubernetes is widely used in the industry to deploy product and application at scale.

It can be useful for the community to have a llama.cpp helm chart for the server.

I have started several weeks ago, I will continue when I have more time, meanwhile any help is welcomed:

https://github.com/phymbert/llama.cpp/tree/example/kubernetes/examples/kubernetes

References

6545

OmegAshEnr01n commented 1 month ago

Hi! I will take this up!

phymbert commented 1 month ago

Great @OmegAshEnr01n , few notes:

I think we need 2 subcharts, one for embeddings, one for generation/completions
probably need to update the schema in my branch as now the model will be downloaded by the server directly, and the related Job should be removed
need to support both HF url parameters and raw url for internal model repo like artifactory
metrics scrapping must work for prometheus community (with the resourcePodMonitoring), enterprise and ideally dynatrace
pvc must stay after the helm is un-installed
auto scalling can be done later on, but this is a must have
ideally the helm must be built by the CI and installable from gh-pages

Ping here if you have question, good luck ! Excited to use it.

phymbert commented 1 month ago

Hi @OmegAshEnr01n, are you still working on this issue ?

OmegAshEnr01n commented 1 month ago

Yes, still am. Will share a pull request over the weekend when completed.

OmegAshEnr01n commented 3 weeks ago

Hi @phymbert

What is the architecutral reason for having embedding living on a seperate deployment to the model? Becuase requiring that would mean we would need to make changes to the http server. Instead of that we can have an architecture where model and embedding is tightly coupled. Something like this

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      {{- range $i, $container := .Values.containers }}
      - name: my-container-{{ $i }}
        image: {{ $container.image }}
        volumeMounts:
        - name: data-volume-{{ $i }}
          mountPath: /data
      {{- end }}
      volumes:
      {{- range $i, $container := .Values.containers }}
      - name: data-volume-{{ $i }}
        persistentVolumeClaim:
          claimName: pvc-{{ $i }}
      {{- end }}

On another note, What is the intended use of prometheus? Do you need it to live alongside the helm chart or within it as a subchart? I dont see the value in adding prometheus as a subchart. Perhaps you can share your view on it as well.

phymbert commented 3 weeks ago

Embeddings model are different from the generative ones. In an RAG setup you need two models.

Prometheus is not required but if present metrics are exported.

OmegAshEnr01n commented 3 weeks ago

Ok, Just to clarify, the server.cpp has a route for requesting embeddings but the existing code for the server doesnt include the option to send embeddings for completions . That will need to be written before the helm chart can be completed. Kindly correct me if im wrong.

phymbert commented 3 weeks ago

Embeddings aim to be stored in a vector db for search. There is nothing related to completions except RAG later on. There is nothing to do with the server code.

ceddybi commented 2 weeks ago

@OmegAshEnr01n Sir, is the chart ready for production ? 🚀🚀🚀🚀

OmegAshEnr01n commented 2 weeks ago

Not yet. Currently testing it on a personal kube cluster with separate node selectors.

ggerganov / llama.cpp

kubernetes example #6546

Motivation

References

6545