NVIDIA / gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
Apache License 2.0
1.86k stars 303 forks source link

VCS licenses are acquired per-cluster rather than per-gpu #267

Open kralicky opened 3 years ago

kralicky commented 3 years ago

1. Issue or feature description

I have created four clusters and installed GPU operator into all 4. Each cluster contains one node which has been given 1 of 8 available VGPUs from the host, split between two GPUs providing 4 VGPUs each (everything is done on one machine with VMs). nvidia-gridd leases four licenses from the NLS, but it should only lease two.

nvidia-smi output on the host:

❯ nvidia-smi vgpu
Fri Oct  8 18:41:01 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63                 Driver Version: 470.63                    |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  Tesla T4                   | 00000000:85:00.0             |   0%       |
|      3251649543  GRID T4-4C     | 0607...  instance-00000273   |      0%    |
+---------------------------------+------------------------------+------------+
|   1  Tesla T4                   | 00000000:C1:00.0             |   0%       |
|      3251635394  GRID T4-4C     | aa32...  instance-00000264   |      0%    |
|      3251643331  GRID T4-4C     | d5b3...  instance-0000026e   |      0%    |
|      3251652536  GRID T4-4C     | 6107...  instance-00000275   |      0%    |
+---------------------------------+------------------------------+------------+

image

cdesiniotis commented 3 years ago

Hi @kralicky -- this is the expected behavior. You need a license per VM. It looks like you have 4 VMs, and so 4 licenses should be leased.

kralicky commented 3 years ago

The official documentation says the licenses are per-GPU: image

shivamerla commented 3 years ago

@kralicky Please refer to this documentation: https://docs.nvidia.com/grid/13.0/grid-licensing-user-guide/index.html

For C-series NVIDIA vGPU deployments, one license per vGPU assigned to a VM is enforced through software. This license is valid for up to eight vGPU instances on a single GPU or for the assignment to a VM of one vGPU that is assigned all the physical GPU's frame buffer. When multiple C-series vGPUs are assigned to a single VM, one license for each vGPU assigned to the VM is required. One license is enforced through software. The remaining licenses are enforced through the EULA
kralicky commented 3 years ago

This doesn't make sense. Why license individual VGPUs instead of the single physical GPU? This is the use case for VGPUs - please consider changing this.

shivamerla commented 3 years ago

@kralicky we will share the feedback with appropriate teams internally, but if you can open a support case to clarify this, that would be great.