GoogleCloudPlatform / container-engine-accelerators

Collection of tools and examples for managing Accelerated workloads in Kubernetes Engine
Apache License 2.0
211 stars 150 forks source link

Add partition and MIG profiles for H200 #399

Closed aston-github closed 4 weeks ago

aston-github commented 1 month ago

This PR adds partition profiles and MIG support for H200 GPUs.

  1. nvidia-smi -i 0 to obtain GPU name.

  2. sudo nvidia-smi -i 0 -mig 1 to enable MIG.

  3. nvidia-smi mig -lgip to list profile ids

Commands are taken from Nvidia docs:

https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#enable-mig-mode https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#lgi