NVIDIA / gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
Apache License 2.0
1.78k stars 286 forks source link

Specifying Specific GPU Models for Pods in Nodes with Multiple GPU Types #656

Open anencore94 opened 8 months ago

anencore94 commented 8 months ago

2. Issue or feature description

I am currently working with a Kubernetes cluster where some nodes are equipped with multiple types of NVIDIA GPUs. For example, Node A has one A100 GPU and one V100 GPU. In such a setup, I am looking for a way to specify which GPU model should be allocated when a user creates a GPU-allocated pod.

From my understanding, in such cases, we would typically request a GPU in our pod specifications using resources.limits with nvidia.com/gpu: 1. However, this approach doesn't seem to provide a way to distinguish between different GPU models.

Is there a feature or method within the NVIDIA GPU Operator or Kubernetes ecosystem that allows for such specific GPU model selection during pod creation? If not, are there any best practices or recommended approaches to ensure a pod is scheduled with a specific type of GPU when multiple models are present in the same node?

Thank you for your time and assistance.

cdesiniotis commented 8 months ago

@anencore94 there is unfortunately no supported way of accomplishing this today with the device plugin API.

Dynamic Resource Allocation, a new API for requesting and allocating resources in Kubernetes, would allow us to naturally support such configurations, but it is currently an alpha feature.

anencore94 commented 8 months ago

@cdesiniotis Thanks for sharing :). You mean implement this feature using Dynamic Resource Allocation API needs quite a long time, I guess..

laszlocph commented 7 months ago

I was able to pick the GPU by specifying the

apiVersion: v1
kind: Pod
metadata:
  name: vllm-openai
  namespace: training
spec:
  runtimeClassName: nvidia
  containers:
  - name: vllm-openai
    image: "vllm/vllm-openai:latest"
    args: ["--model", "Qwen/Qwen1.5-14B-Chat"]
+    env:
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: "0"
    resources:
      limits:
        nvidia.com/gpu: 1

variable. Where the number is the zero-indexed number of my GPUs.

These other vars may also work, but have not tested them: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

anencore94 commented 7 months ago

@laszlocph Thanks for your case! However, I'd like to control it in k8s way. 🥲

jjaymick001 commented 6 months ago

I do this via nodeSelector.

kubectl get nodes -L nvidia.com/gpu.count -L nvidia.com/gpu.product
NAME            STATUS   ROLES           AGE    VERSION   GPU.COUNT   GPU.PRODUCT
dell-mx740c-2   Ready    control-plane   3d8h   v1.26.3   1           NVIDIA-A100-PCIE-40GB
dell-mx740c-3   Ready    control-plane   3d8h   v1.26.3   2           Tesla-T4
dell-mx740c-7   Ready    <none>          3d8h   v1.26.3   2           Quadro-RTX-8000
dell-mx740c-8   Ready    <none>          3d8h   v1.26.3   2           NVIDIA-A100-PCIE-40GB

I can use gpu.product as the selector to ensure the pod lands on the intended GPU type like this.

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-ver-740c-8
spec:
  restartPolicy: OnFailure
  nodeSelector:
     nvidia.com/gpu.product: "NVIDIA-A100-PCIE-40GB"
     nvidia.com/gpu.count: "2"
  containers:
  - name: nvidia-version-check
    image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: "1"