NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes
Apache License 2.0
2.54k stars 582 forks source link

nvidia.com/gpu.product and nvidia.com/gpu.replicas does not reflect heterogeneous device setup #731

Open Suckzoo opened 1 year ago

Suckzoo commented 1 year ago

Hello,

We're testing gpu-feature-discovery on our DGX machine.

The DGX machine has two types of GPU: one is "NVIDIA-DGX-Display", and the other is "NVIDIA A100-SXM4-80GB" Currently, gpu.product and gpu.replicas nodelabels can hold information of one GPU, literally only one GPU. We're seeing that the values of those two labels are changing periodically: once reflects NVIDIA-DGX-Display, and then reflects NVIDIA-A100-SXM4-80GB, like, nvidia.com/gpu.product: NVIDIA-DGX-Display, nvidia.com/gpu.replicas: 1 <-> nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB, nvidia.com/gpu.replicas: 4

It looks like we need to introduce another label that is capable of holding multiple gpu device information.

klueska commented 1 year ago

We are aware of a similar, yet slightly different issue with GFD support on DGX-Station machines. Our plan for the next release is to completely filter out all DISPLAY devices, and only support COMPUTE devices in our enumeration of GPUs for both the device plugin and GFD. In the future, we may decide to support DISPLAY devices, but at that point they would show up as a different type of allocatable device (e.g. nvidia.com/display instead of nvidia.com/gpu), and the labels applied by GFD would reflect this similarly (i.e. nvidia.com/display.product and nvidia.com/display.replicas, etc.).

Suckzoo commented 1 year ago

@klueska Thanks for your quick response. One quick question: considering a node consists of 2 RTX 2080 and 2 RTX 3090 (or whatever model, anyway a computer equipped two different model of GPU; I don't know it's a usual setup or not), how would the GFD work in such situation?

klueska commented 1 year ago

It only reports one of them at present. Whichever ones happens to show up as index 0 when calling into NVIDIAs NVML library.

Suckzoo commented 1 year ago

I meant, GFD in the future. Sorry for the confusion.

klueska commented 1 year ago

We had added support about 6 months ago o allow such setups to be detected and allow users to assign a different resource name to each of them (i.e. nvidia.com/rtx-2080 vs nvidia.com/rtx-3090), but it got reverted because our product team wasn’t happy putting arbitrary resource naming in the hands of users.

klueska commented 1 year ago

This is how it would have worked: https://docs.google.com/document/d/1dL67t9IqKC2-xqonMi6DV7W2YNZdkmfX7ibB6Jb-qmk/edit

sftim commented 1 year ago

There is a KEP for dynamic resource allocation. That architecture allows a Pod to find a node where some suitable GPU exists, even where the node has multiple GPUs. Those GPUs can be fixed (even soldered in!), it doesn't have to be a hotplug scenario.

To me, that'd be the way forward for clusters where nodes have a mix of GPUs.

klueska commented 1 year ago

Yes, that is the plan forward. The POC of of our DRA resource driver for GPUs can be found here: https://gitlab.com/nvidia/cloud-native/k8s-dra-driver

It will soon include the notion of a deviceSelector in the GPUClaimParameters object so you can do things like:

apiVersion: gpu.resource.nvidia.com/v1alpha1
kind: GpuClaimParameters
metadata:
  namespace: gpu-test
  name: a100
spec:
  count: 1
  selector:
    andExpression:
      - productName: "*A100*"
      - driverVersion:
          value: "460"
          operator: GreaterThan

or

apiVersion: gpu.resource.nvidia.com/v1alpha1
kind: GpuClaimParameters
metadata:
  namespace: gpu-test
  name: t4
spec:
  count: 1
  selector:
    andExpression:
      - productName: "*T4*"
      - driverVersion:
          value: "460"
          operator: GreaterThan

etc.