NVIDIA / gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.77k stars 286 forks source link

[cherrypick][release-24.6][H100 NVL]update all-balanced MIG config #903

Closed tariq1890 closed 1 month ago

tariq1890 commented 1 month ago

This change improves the utilisation of the H100 NVL GPU when using the all-balanced config. When checking the nvidia-smi mig -lgip output below:

nvidia-smi mig -lgip +-----------------------------------------------------------------------------+ | GPU instance profiles: | | GPU Name ID Instances Memory P2P SM DEC ENC | | Free/Total GiB CE JPEG OFA | |=============================================================================| | 0 MIG 1g.12gb 19 7/7 10.75 No 16 1 0 | | 1 1 0 | +-----------------------------------------------------------------------------+ | 0 MIG 1g.12gb+me 20 1/1 10.75 No 16 1 0 | | 1 1 1 | +-----------------------------------------------------------------------------+ | 0 MIG 1g.24gb 15 4/4 21.62 No 26 1 0 | | 1 1 0 | +-----------------------------------------------------------------------------+ | 0 MIG 2g.24gb 14 3/3 21.62 No 32 2 0 | | 2 2 0 | +-----------------------------------------------------------------------------+ | 0 MIG 3g.47gb 9 2/2 46.38 No 60 3 0 | | 3 3 0 | +-----------------------------------------------------------------------------+ | 0 MIG 4g.47gb 5 1/1 46.38 No 64 4 0 | | 4 4 0 | +-----------------------------------------------------------------------------+ | 0 MIG 7g.94gb 0 1/1 93.12 No 132 7 0 | | 8 7 1 | +-----------------------------------------------------------------------------+

With the new all-balanced config total memory of the MIG slices amounts to: 89.5 GB (10.75 + 10.75 + 21.62 + 46.38)

Signed-off-by: Tariq Ibrahim tibrahim@nvidia.com (cherry picked from commit e61688015fa2ffb64415f90cc53878b2da9bb471)