nvidia-installer

This component compiles NVIDIA kernel modules for Garden Linux in a Docker image at build time. Running the image in a cluster as part of a DaemonSet installs the GPU driver on the required nodes.

Building the Docker image

To build the image for NVIDIA driver version 535.86.10 on Garden Linux 934.11 for amd64-based CPUs:

docker build . --platform=linux/amd64 --build-arg TARGET_ARCH=amd64 --build-arg DRIVER_VERSION=535.86.10 --build-arg GARDENLINUX_VERSION=934.11

If you need to build for a baremetal node (as opposed to a cloud VM) then add --build-arg KERNEL_TYPE=baremetal to the above command.

Deploying nvidia-installer with Helm

First build the image as described above, and then push it to your Docker registry.

Next, edit the file todo-values.yaml in the helm folder to specify the location of the Docker image and the values of the NVIDIA driver version and Garden Linux version. The Garden Linux version is used to tell the DaemonSet which nodes to target - the sample nodeAffinity values assume that your GPU nodes have a gpu label, and also an os-version label which is set to the Garden Linux version.

Now you can deploy the DaemonSets for the NVIDIA Driver installer and the NVIDIA Device Plugin along with the related imagePullSecret with the following command:

helm install nvidia ./helm --namespace kube-system --values helm/todo-values.yaml

Note that the resulting Pods must run in a namespace that is allowed to spawn pods with priorityClassName: system-node-critical - this is true for example in the case of the kube-system namespace.

gardenlinux / gardenlinux-nvidia-installer

readme

nvidia-installer

Building the Docker image

Deploying nvidia-installer with Helm

Further reading

High level structure of the Dockerfile

Background