I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it？

NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes

Apache License 2.0

2.54k stars 582 forks source link

I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it？ #769

Open Flynn-Zh opened 2 weeks ago

Flynn-Zh commented 2 weeks ago

there are two gpu device on the kubenetes node, the timeSlicing.replicas is two, the nvidia.com/gpu of large langurage model is two, the nvidia.com/gpu of other models are one, but the pod of large langurage model has two gpu device

klueska commented 2 weeks ago

You can't do this with the standard device plugin. You will need to wait until DRA is available: https://docs.google.com/document/d/1BNWqgx_SmZDi-va_V31v3DnuVwYnF2EmN7D-O_fB6Oo/edit#heading=h.bxuci8gx6hna