NVIDIA / gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
Apache License 2.0
1.86k stars 303 forks source link

NVIDIA Device Plugin Only Exposes One GPU Out of Two GPUs Installed on Single Node #1079

Open amir-bialek opened 3 weeks ago

amir-bialek commented 3 weeks ago

Hey all,

"I have an on-premises Kubernetes cluster with multiple nodes. One of these nodes is equipped with two different GPU models: NVIDIA GeForce RTX 3090 and NVIDIA GeForce RTX 4090

When I SSH into this node and run nvidia-smi, both GPUs are properly detected and displayed. I have installed the NVIDIA Device Plugin using gpu-operator Helm chart (https://github.com/NVIDIA/gpu-operator/tree/main/deployments/gpu-operator). However, only the RTX 4090 is being exposed as a resource to Kubernetes. Here is my current configuration:

devicePlugin:
  config:
    name: time-slicing-config-all
    create: true
    default: "any"
    data:
      any: |-
        version: v1
        flags:
          migStrategy: none
        sharing:
          timeSlicing:
            resources:
            - name: nvidia.com/gpu
              replicas: 5

I have tried different type of the configuration, but it always show only one type. Any help ?

klueska commented 3 weeks ago

As mentioned here, the k8s-device-plugin doesn't support multiple GPU types per node: https://github.com/NVIDIA/k8s-device-plugin/discussions/1021#discussioncomment-11090356