Open phymbert opened 1 month ago
Hi! I will take this up!
Great @OmegAshEnr01n , few notes:
Ping here if you have question, good luck ! Excited to use it.
Hi @OmegAshEnr01n, are you still working on this issue ?
Yes, still am. Will share a pull request over the weekend when completed.
Hi @phymbert
What is the architecutral reason for having embedding living on a seperate deployment to the model? Becuase requiring that would mean we would need to make changes to the http server. Instead of that we can have an architecture where model and embedding is tightly coupled. Something like this
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
{{- range $i, $container := .Values.containers }}
- name: my-container-{{ $i }}
image: {{ $container.image }}
volumeMounts:
- name: data-volume-{{ $i }}
mountPath: /data
{{- end }}
volumes:
{{- range $i, $container := .Values.containers }}
- name: data-volume-{{ $i }}
persistentVolumeClaim:
claimName: pvc-{{ $i }}
{{- end }}
On another note, What is the intended use of prometheus? Do you need it to live alongside the helm chart or within it as a subchart? I dont see the value in adding prometheus as a subchart. Perhaps you can share your view on it as well.
Embeddings model are different from the generative ones. In an RAG setup you need two models.
Prometheus is not required but if present metrics are exported.
Ok, Just to clarify, the server.cpp has a route for requesting embeddings but the existing code for the server doesnt include the option to send embeddings for completions . That will need to be written before the helm chart can be completed. Kindly correct me if im wrong.
Embeddings aim to be stored in a vector db for search. There is nothing related to completions except RAG later on. There is nothing to do with the server code.
@OmegAshEnr01n Sir, is the chart ready for production ? 🚀🚀🚀🚀
Not yet. Currently testing it on a personal kube cluster with separate node selectors.
Motivation
Kubernetes is widely used in the industry to deploy product and application at scale.
It can be useful for the community to have a
llama.cpp
helm chart for the server.I have started several weeks ago, I will continue when I have more time, meanwhile any help is welcomed:
https://github.com/phymbert/llama.cpp/tree/example/kubernetes/examples/kubernetes
References
6545