I want to deploy three models, one large language model occupying one GPU, one embedding model and one re-ranking model sharing one GPU, How can I do it? #769
there are two gpu device on the kubenetes node,
the timeSlicing.replicas is two,
the nvidia.com/gpu of large langurage model is two,
the nvidia.com/gpu of other models are one,
but the pod of large langurage model has two gpu device
there are two gpu device on the kubenetes node, the timeSlicing.replicas is two, the nvidia.com/gpu of large langurage model is two, the nvidia.com/gpu of other models are one, but the pod of large langurage model has two gpu device