NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.45k stars 260 forks source link

Container Toolkit Support GKE/COS Platform and its test coverage #209

Open Dragoncell opened 9 months ago

Dragoncell commented 9 months ago

For pods inside the GPU Operator, after driver installation finished, they rely on container toolkit starts on the node for setting up the nvidia container runtime:
Download nvidia container runtime, hooks, container ctk(nvidia-ctk) and copy over from container to the host /run/nvidia/toolkit. [link] Update the containerd config file based on container runtime like nvidia or nvidia-cdi Generate the CDI spec for management containers if runtime is nvidia-cdi

Toolkit is a necessary component on GPU Operator, to make it work on COS, we needs to:

Support necessary binary from container toolkit for the COS platform. So far, the container runtime, hooks, and container CTK are not yet supported (supported platform lists).

Starting from COS109, the nvidia-ctk is pre-built in COS. However, in current state (intermediate CDI mode), it still requires nvidia container runtime (nvidia-cdi) binaries. For legacy mode support, nvidia container runtime(nvidia-legacy) and its hooks are also required

The goal is to achieve the same functionality of container toolkit on COS platform with custom installed driver, container runtime binaries path

elezar commented 9 months ago

@Dragoncell I am not too familiar with COS. Is there anything specific to the OS that means that the toolkit-container (with some modification) cannot be used to install the components of the NVIDIA Container Toolkit? What it essentially does is:

  1. Copy the pre-built binaries and libraries from the running container image to a location mounted from the host.
  2. Update the config file (on the host through the mount) for this location and the location of the containerized driver.
bobbypage commented 9 months ago

/cc

One thing that comes to mind is that COS filesystem is read only (https://cloud.google.com/container-optimized-os/docs/concepts/disks-and-filesystem), so only certain paths are mounted with exec. So I think the path on the host should be configurable, i.e. we can't use default of /usr/bin since it's readonly.

cdesiniotis commented 9 months ago

The install path for NVIDIA Container Toolkit is already configurable today with the operator: https://github.com/NVIDIA/gpu-operator/blob/v23.9.1/api/v1/clusterpolicy_types.go#L669-675. Here is the setting in the helm chart: https://github.com/NVIDIA/gpu-operator/blob/v23.9.1/deployments/gpu-operator/values.yaml#L229

Dragoncell commented 9 months ago

Thanks for the pointer of the config.

From my understanding, to use the toolkit in GPU Operator in our cases, we requires two changes and below are my test setup for trying it out: based on https://github.com/NVIDIA/gpu-operator/tree/release-23.9

Changes:

changes: /assets/state-container/toolkit/0500_daemonset.yaml a): disabled the driver-validation before we support it b) update the driver install path and the env

env: 
 - name: NVIDIA_DRIVER_ROOT
          value: "/home/kubernetes/bin/nvidia"
 - name: DRIVER_ROOT
          value: "/home/kubernetes/bin/nvidia"
- name: DRIVER_ROOT_CTR_PATH
          value: "/home/kubernetes/bin/nvidia"

volumes: 
- name: driver-install-path
          hostPath:
            path: /home/kubernetes/bin/nvidia

New: seems like env is not necessary needed to be set in toolkit's daemonset, in the configmap, it can be updated to

    driver_root=/home/kubernetes/bin/nvidia

changes in values.yaml:

installDir: "/home/kubernetes/bin/nvidia"

From the test result:

a) It can Install the binary in correct path

/home/kubernetes/bin/nvidia/toolkit$ ls
libnvidia-container-go.so.1       libnvidia-container.so.1.14.2  nvidia-container-runtime            nvidia-container-runtime.cdi       nvidia-container-runtime.legacy.real  nvidia-ctk
libnvidia-container-go.so.1.14.2  nvidia-container-cli           nvidia-container-runtime-hook       nvidia-container-runtime.cdi.real  nvidia-container-runtime.real         nvidia-ctk.real
libnvidia-container.so.1          nvidia-container-cli.real      nvidia-container-runtime-hook.real  nvidia-container-runtime.legacy    nvidia-container-toolkit

b) the pod failing in a crash loop somehow, the log indicated that pod status:

nvidia-container-toolkit-daemonset-nlxdv                   0/1     CrashLoopBackOff   4 (72s ago)   3m10s

pod logs:

accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false

[nvidia-container-cli]
  environment = []
  ldconfig = "@/run/nvidia/driver/sbin/ldconfig.real"
  load-kmods = true
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-container-cli"
  root = "/home/kubernetes/bin/nvidia"

[nvidia-container-runtime]
  log-level = "info"
  mode = "cdi"
  runtimes = ["docker-runc", "runc"]

  [nvidia-container-runtime.modes]

    [nvidia-container-runtime.modes.cdi]
      annotation-prefixes = ["nvidia.cdi.k8s.io/"]
      default-kind = "management.nvidia.com/gpu"

    [nvidia-container-runtime.modes.csv]
      mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-container-runtime-hook"
  skip-mode-detection = true

[nvidia-ctk]
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-ctk"

time="2024-02-06T06:24:07Z" level=info msg="Creating control device nodes at /home/kubernetes/bin/nvidia"
time="2024-02-06T06:24:07Z" level=fatal msg="error: failed to create control device nodes: failed to create device node nvidiactl: no such file or directory"
time="2024-02-06T06:24:07Z" level=info msg="Shutting Down"
time="2024-02-06T06:24:07Z" level=error msg="error running nvidia-toolkit: unable to install toolkit: error running [toolkit install --toolkit-root /home/kubernetes/bin/nvidia/toolkit] command: exit status 1"

Wondering why it failed to find file or directory under /home/kubernetes/bin/nvidia ?

Next Step

a) Support both NVIDIA_DRIVER_ROOT and DRIVER_ROOT env in toolkit daemonset similar to the installDir. e.g a parameter called InstallDriverRoot in values.yaml and the operator code can update the daemonset's env accordingly

b) What's the recommend way for updating the driver root in toolkit ? and is using the DRIVER_ROOT, NVIDIA_DRIVER_ROOT or DRIVER_ROOT_CTR_PATH in env looks right to you to pass the custom path to the nvidia toolkit or through the configmap ?

Does above makes sense ? Let me know what's your thought, thanks

elezar commented 9 months ago

@Dragoncell the operands of the GPU Operator have some logic included to automatically detect the location where the driver is available (the driver root).

For example, in the case of the container toolkit we construct an entrypoint.sh here: https://github.com/NVIDIA/gpu-operator/blob/5f36d3600da50e6a0239996a7b12f677eb66a671/assets/state-container-toolkit/0400_configmap.yaml#L10-L22 that checks the output of the driver validator and sets the NVIDIA_DRIVER_ROOT envvar.

This logic would have to be updated to allow for a custom driver root to be specified. There is a merge request outstanding (see https://gitlab.com/nvidia/kubernetes/gpu-operator/-/merge_requests/960) where this has been proposed .

One of the issues that has prevented us from merging the MR as-is, is that we currently assume that a driver-root (e.g. the path defined by NVIDIA_DRIVER_ROOT is a full filesystem that can be chrooted to. The fact that your /home/kubernetes/bin/nvidia location is not such a root is the reason for the errors that you are seeing when creating the control device nodes in the NVIDIA Container Toolkit (I have created https://github.com/NVIDIA/nvidia-container-toolkit/issues/344 to at least provide an option to disable this). We have started separating the notion of a driver root (from the context of libraries) and a device root for the creation of device nodes.

With regards to your next steps:

a) Support both NVIDIA_DRIVER_ROOT and DRIVER_ROOT env in toolkit daemonset similar to the installDir. e.g a parameter called InstallDriverRoot in values.yaml and the operator code can update the daemonset's env accordingly

I think that this in general makes sense. A user should be able to specify both where their driver is rooted, and where there device nodes are rooted. If these are not specified we will revert to logic to autodetect these as we currently do.

Maybe extending the operator values.yaml as follows:

diff --git a/deployments/gpu-operator/values.yaml b/deployments/gpu-operator/values.yaml
index 359b73c2..957ac1fb 100644
--- a/deployments/gpu-operator/values.yaml
+++ b/deployments/gpu-operator/values.yaml
@@ -123,6 +123,15 @@ mig:
   strategy: single

 driver:
+  # libraryRoot specifies the root at which the driver libraries are available.
+  libraryRoot: "auto"
+  # deviceRoot specifies the root at which NVIDIA device nodes are available.
+  # If this is unspecified or empty, the value of the libraryRoot will be used.
+  # Note that if driver.libraryRoot is set to auto, the resolved value is used.
+  # For a value of 'auto', the device root is detected. Here, if the resolved
+  # libraryRoot is a full filesystem such as '/' or '/run/nvidia/driver' when
+  # managed by the driver container this path will be used.
+  deviceRoot: ""
   enabled: true
   nvidiaDriverCRD:
     enabled: false

With regards to:

b) What's the recommend way for updating the driver root in toolkit ? and is using the DRIVER_ROOT, NVIDIA_DRIVER_ROOT or DRIVER_ROOT_CTR_PATH in env looks right to you to pass the custom path to the nvidia toolkit or through the configmap ?

The short answer is NVIDIA_DRIVER_ROOT. In general, DRIVER_ROOT and NVIDIA_DRIVER_ROOT can be considered aliases of each other -- although this is not consistently applied in the container toolkit. From the context of the usage in the GPU Operator, only NVIDIA_DRIVER_ROOT is considered (as defined here https://github.com/NVIDIA/nvidia-container-toolkit/blob/15d905def056f37da6fa67be25b363095cdab79a/tools/container/toolkit/toolkit.go#L124) (I have also created #343 to track making this consistent).

Note that the DRIVER_ROOT_CTR_PATH is the path at which the driver root is mounted into the container where the toolkit is running. This is important in the case of containerized CDI spec generation since the generated spec will locate libraries relative to this path and these need to be transformed to host paths. In the case of our other components such as the Device Plugin, we generally hardcode this path to /driver-root in the container to make it easier to reason about.

Dragoncell commented 9 months ago

Thanks for the above changes. I cherrypicked the changes and test it out using the GPU Operator with modified env for container toolkit

 - name: CREATE_DEVICE_NODES
          value: ""

I saw the create device node error is gone, but encounter a new error like below:

Using config:
accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"

[nvidia-container-cli]
  environment = []
  ldconfig = "@/home/kubernetes/bin/nvidia/sbin/ldconfig"
  load-kmods = true
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-container-cli"
  root = "/home/kubernetes/bin/nvidia"

[nvidia-container-runtime]
  log-level = "info"
  mode = "cdi"
  runtimes = ["docker-runc", "runc", "crun"]

  [nvidia-container-runtime.modes]

    [nvidia-container-runtime.modes.cdi]
      annotation-prefixes = ["nvidia.cdi.k8s.io/"]
time="2024-02-13T23:34:47Z" level=info msg="Generating CDI spec for management containers"
      default-kind = "management.nvidia.com/gpu"
      spec-dirs = ["/etc/cdi", "/var/run/cdi"]

    [nvidia-container-runtime.modes.csv]
      mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-container-runtime-hook"
  skip-mode-detection = true

[nvidia-ctk]
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-ctk"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia*: pattern /dev/nvidia* not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-caps/nvidia-cap*: pattern /dev/nvidia-caps/nvidia-cap* not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-modeset: pattern /dev/nvidia-modeset not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-uvm-tools: pattern /dev/nvidia-uvm-tools not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-uvm: pattern /dev/nvidia-uvm not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidiactl: pattern /dev/nvidiactl not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia*: pattern /dev/nvidia* not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-caps/nvidia-cap*: pattern /dev/nvidia-caps/nvidia-cap* not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-modeset: pattern /dev/nvidia-modeset not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-uvm-tools: pattern /dev/nvidia-uvm-tools not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidia-uvm: pattern /dev/nvidia-uvm not found"
time="2024-02-13T23:34:47Z" level=warning msg="Could not locate /dev/nvidiactl: pattern /dev/nvidiactl not found"
time="2024-02-13T23:34:47Z" level=fatal msg="error: error generating CDI specification: failed to genereate CDI spec for management containers: no NVIDIA device nodes found"
time="2024-02-13T23:34:47Z" level=info msg="Shutting Down"
time="2024-02-13T23:34:47Z" level=error msg="error running nvidia-toolkit: unable to install toolkit: error running [toolkit install --toolkit-root /home/kubernetes/bin/nvidia/toolkit] command: exit status 1"

From the container toolkit assumption, I guess it thinks that /dev is under the driverRoot directory. However, in our cases

under hostRoot /dev: ls | grep nvidia
nvidia-caps
nvidia-modeset
nvidia-uvm
nvidia-uvm-tools
nvidia0
nvidiactl

under hostDriverRoot /home/kubernetes/bin/nvidia:
NVIDIA-Linux-x86_64-535.104.12.run  bin  bin-workdir  drivers  drivers-workdir  firmware  lib64  lib64-workdir  nvidia-drivers-535.104.12.tgz  nvidia-installer.log  share  toolkit  vulkan

I also tried to specify the root as "/", and it failed with an another error:

Using config:
accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"

[nvidia-container-cli]
  environment = []
  ldconfig = "@/sbin/ldconfig.real"
  load-kmods = true
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-container-cli"
  root = "/"

[nvidia-container-runtime]
  log-level = "info"
  mode = "cdi"
  runtimes = ["docker-runc", "runc", "crun"]

  [nvidia-container-runtime.modes]

    [nvidia-container-runtime.modes.cdi]
      annotation-prefixes = ["nvidia.cdi.k8s.io/"]
      default-kind = "management.nvidia.com/gpu"
      spec-dirs = ["/etc/cdi", "/var/run/cdi"]

    [nvidia-container-runtime.modes.csv]
      mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-container-runtime-hook"
  skip-mode-detection = true

[nvidia-ctk]
  path = "/home/kubernetes/bin/nvidia/toolkit/nvidia-ctk"
time="2024-02-13T23:51:46Z" level=info msg="Generating CDI spec for management containers"
time="2024-02-13T23:51:46Z" level=info msg="Selecting /host/dev/nvidia-modeset as /dev/nvidia-modeset"
time="2024-02-13T23:51:46Z" level=info msg="Selecting /host/dev/nvidia-uvm as /dev/nvidia-uvm"
time="2024-02-13T23:51:46Z" level=info msg="Selecting /host/dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools"
time="2024-02-13T23:51:46Z" level=info msg="Selecting /host/dev/nvidia0 as /dev/nvidia0"
time="2024-02-13T23:51:46Z" level=info msg="Selecting /host/dev/nvidiactl as /dev/nvidiactl"
time="2024-02-13T23:51:46Z" level=info msg="Selecting /host/dev/nvidia-caps/nvidia-cap1 as /dev/nvidia-caps/nvidia-cap1"
time="2024-02-13T23:51:46Z" level=info msg="Selecting /host/dev/nvidia-caps/nvidia-cap2 as /dev/nvidia-caps/nvidia-cap2"
time="2024-02-13T23:51:46Z" level=fatal msg="error: error generating CDI specification: failed to genereate CDI spec for management containers: failed to get CUDA version: failed to locate libcuda.so: pattern libcuda.so.*.* not found\n64-bit library libcuda.so.*.*: not found"
time="2024-02-13T23:51:46Z" level=info msg="Shutting Down"
time="2024-02-13T23:51:46Z" level=error msg="error running nvidia-toolkit: unable to install toolkit: error running [toolkit install --toolkit-root /home/kubernetes/bin/nvidia/toolkit] command: exit status 1"

Next Step: As you mentioned: We have started separating the notion of a driver root (from the context of libraries) and a device root for the creation of device nodes. Wondering what's the progress there ? I searched the code, and seems like there is a variable called librarySearchPaths or devRoot, wondering will it help in this case ?

elezar commented 9 months ago

Thanks for reporting this. The issue here is that the definition of what a "driverRoot" is from the perspective of the toolkit container is pretty rigid. It currently means that this folder is a chroot-able filesystem and BOTH the libraries AND device nodes are rooted there.

I have created #360 to add the option for specifying these separately. Since you're using /home/kubernetes/bin/nvidia for both the NVIDIA_DRIVER_ROOT and DRIVER_ROOT_CTR_PATH, that should still work as expected and it should only be required to mount / to /host in the container (this should already be done by the operator) and set NVIDIA_DEV_ROOT and DEV_ROOT_CTR_PATH accordingly.

Dragoncell commented 9 months ago

Thanks for the proposed dev-root option, I cherry-pick the commit on top of disable the create device node, and test it out with below envs:

    export NVIDIA_DRIVER_ROOT=/home/kubernetes/bin/nvidia
    export DRIVER_ROOT_CTR_PATH=/home/kubernetes/bin/nvidia
    export NVIDIA_DEV_ROOT=/
    export DEV_ROOT_CTR_PATH=/host

and it works out for container toolkit and the log looks good too:

$ kubectl get pods -n gpu-operator
nvidia-container-toolkit-daemonset-dlnb2                   1/1     Running            0              8m17s

then I manually ssh into the node and execute for nvidia device plugin:

sudo touch /run/nvidia/validations/toolkit-ready  
sudo touch /run/nvidia/validations/host-driver-ready

with the below env on configmap of device plugin

driver_root=/home/kubernetes/bin/nvidia
container_driver_root=$driver_root

export NVIDIA_DRIVER_ROOT=$driver_root
export CONTAINER_DRIVER_ROOT=$container_driver_root
export NVIDIA_CTK_PATH=/home/kubernetes/bin/nvidia/toolkit/nvidia-ctk
export PATH="$PATH:/home/kubernetes/bin/nvidia/bin";
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kubernetes/bin/nvidia/lib64;

I saw the pod failed due to below error:

time="2024-02-14T20:58:06Z" level=info msg="Generating CDI spec for resource: k8s.device-plugin.nvidia.com/gpu"
time="2024-02-14T20:58:06Z" level=warning msg="Could not locate /dev/nvidia0: pattern /dev/nvidia0 not found"
time="2024-02-14T20:58:06Z" level=warning msg="Could not locate /dev/nvidia0: pattern /dev/nvidia0 not found"
time="2024-02-14T20:58:06Z" level=warning msg="Could not locate /dev/nvidia0: pattern /dev/nvidia0 not found"
E0214 20:58:06.031396       1 main.go:123] error starting plugins: error creating plugin manager: unable to create cdi spec file: failed to get CDI spec: failed to create discoverer for common entities: error constructing discoverer for graphics mounts: failed to construct library locator: error loading ldcache: open /home/kubernetes/bin/nvidia/etc/ld.so.cache: no such file or directory

then I also tried with below env on device plugin:

driver_root=/
container_driver_root=/host

export NVIDIA_DRIVER_ROOT=$driver_root
export CONTAINER_DRIVER_ROOT=$container_driver_root
export NVIDIA_CTK_PATH=/home/kubernetes/bin/nvidia/toolkit/nvidia-ctk
export PATH="$PATH:/home/kubernetes/bin/nvidia/bin";
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kubernetes/bin/nvidia/lib64;

it seems like the device plugin is in working state

$ kubectl get pods -n gpu-operator
NAME                                                       READY   STATUS     RESTARTS   AGE
gpu-feature-discovery-ktg68                                1/1     Running    0          2m48s
gpu-operator-f58c4c94-lv9lk                                1/1     Running    0          3m10s
noperator-node-feature-discovery-master-79487579c6-gxgxn   1/1     Running    0          3m10s
noperator-node-feature-discovery-worker-sxwfz              1/1     Running    0          3m10s
nvidia-container-toolkit-daemonset-mp7bd                   1/1     Running    0          2m49s
nvidia-dcgm-exporter-8k8sx                                 1/1     Running    0          2m48s
nvidia-device-plugin-daemonset-vspsp                       1/1     Running    0          2m48s

and log looks good too:

I0214 21:16:21.077283       1 server.go:165] Starting GRPC server for 'nvidia.com/gpu'
I0214 21:16:21.078023       1 server.go:117] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
I0214 21:16:21.081576       1 server.go:125] Registered device plugin for 'nvidia.com/gpu' with Kubelet

then I deployed a GPU workload to try it out

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 600; done;"]
    resources:
      limits:
       nvidia.com/gpu: 1

and the pod is running good too:

$ kubectl get pods 
NAME         READY   STATUS    RESTARTS   AGE
my-gpu-pod   1/1     Running   0          48s

Questions:

  1. Setting the drive root to / seems working for the device plugin. however, it differs the value set in container toolkit. To set the driver root to /home/kubernetes/bin/nvidia, but I didn't find the simialr device root setting in the k8s device plugin repo: https://github.com/NVIDIA/k8s-device-plugin/blob/31f01c2e0c291443c1ddbefc8cdba55768c11275/cmd/nvidia-device-plugin/main.go#L68. In this case, what's recommendation of setup for the device plugin ? thanks

  2. As I started the GPU Operator using the CDI config like this, besides the GPU pods is running good, what other signals or logs we can look at for verifying it indeed using CDI spec through device plugin as we wanted ? Thanks

    helm upgrade -i --create-namespace --namespace gpu-operator noperator deployments/gpu-operator --set driver.enabled=false --set cdi.enabled=true --set cdi.default=true --set operator.runtimeClass=nvidia-cdi
elezar commented 9 months ago

@Dragoncell thanks for the update. I will have to dig a bit further into what is happening here. What I assume is happening is that the device plugin is being started as a management container and since the nvidia-cdi runtime is being used, the driver files and devices are being mounted as expected into the device plugin container. This means that the device detection is working as expected, but may mean that the generated CDI specs for the devices are not as they should be.

Would you be able to confirm that running nvidia-smi on the workload container shows the expected results (since you seem to just be running a sleep). You could also check the generated cdi specs at /var/run/cdi/ on the host.

In general, I think the device plugin is going to need similar change to properly handle the "split" driver and device root. I may have some time to look into it tomorrow, but I would assume that the specs need to be transformed in some way.

/cc @cdesiniotis

Dragoncell commented 8 months ago

elezar Thanks for the suggestion.

With current working configuration of device plugin

driver_root=/
container_driver_root=/host

export NVIDIA_DRIVER_ROOT=$driver_root
export CONTAINER_DRIVER_ROOT=$container_driver_root
export NVIDIA_CTK_PATH=/home/kubernetes/bin/nvidia/toolkit/nvidia-ctk
export PATH="$PATH:/home/kubernetes/bin/nvidia/bin";
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kubernetes/bin/nvidia/lib64;

a) I tried below running nvidia-smi on the workload container:

with simply run the nvidia-smi without GPU request, it working as expected.

kubectl run nvidia-smi --restart=Never --rm -i --tty --image nvidia/cuda:11.0.3-base-ubuntu20.04 -- nvidia-smi

Tue Feb 20 21:55:12 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   36C    P8              17W /  72W |      4MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
pod "nvidia-smi" deleted

however, if I run below pod without the export PATH and LD_LIBRARY_PATH, it failed with error like Warning Failed 11s (x2 over 12s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-base-ubuntu20.04
    command: ["bash", "-c"]
    args: 
    - |-
      # export PATH="$PATH:/home/kubernetes/bin/nvidia/bin";
      # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kubernetes/bin/nvidia/lib64;
      nvidia-smi;
    resources:
      limits: 
        nvidia.com/gpu: "1"

I looked at the OCI spec of the container, the PATH looks like PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

b) In the node, I did see two files present:

/var/run/cdi $ ls
k8s.device-plugin.nvidia.com-gpu.json  management.nvidia.com-gpu.yaml

the config looks good to me

{"cdiVersion":"0.5.0","kind":"k8s.device-plugin.nvidia.com/gpu","devices":[{"name":"GPU-0b182573-6996-0f5d-ad7d-96241c70d91c","containerEdits":{"deviceNodes":[{"path":"/dev/nvidia0","hostPath":"/dev/nvidia0"}]}}],"containerEdits":{"deviceNodes":[{"path":"/dev/nvidia-modeset","hostPath":"/dev/nvidia-modeset"},{"path":"/dev/nvidia-uvm-tools","hostPath":"/dev/nvidia-uvm-tools"},{"path":"/dev/nvidia-uvm","hostPath":"/dev/nvidia-uvm"},{"path":"/dev/nvidiactl","hostPath":"/dev/nvidiactl"}],"hooks":[{"hookName":"createContainer","path":"/home/kubernetes/bin/nvidia/toolkit/nvidia-ctk","args":["nvidia-ctk","hook","update-ldcache","--folder","/home/kubernetes/bin/nvidia/lib64"]}],"mounts":[{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-egl-gbm.so.1.1.0","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-egl-gbm.so.1.1.0","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libcuda.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libcuda.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-fbc.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-fbc.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-gtk3.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-gtk3.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-opencl.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-opencl.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libGLESv2_nvidia.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libGLESv2_nvidia.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-opticalflow.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-opticalflow.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-pkcs11-openssl3.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-pkcs11-openssl3.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-rtcore.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-rtcore.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-encode.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-encode.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-ml.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-ml.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-wayland-client.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-wayland-client.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libEGL_nvidia.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libEGL_nvidia.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-ngx.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-ngx.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-ptxjitcompiler.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-ptxjitcompiler.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libGLESv1_CM_nvidia.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libGLESv1_CM_nvidia.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvcuvid.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvcuvid.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-glsi.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-glsi.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-pkcs11.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-pkcs11.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvoptix.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvoptix.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-glvkspirv.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-glvkspirv.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-nvvm.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-nvvm.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-tls.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-tls.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-vulkan-producer.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-vulkan-producer.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libGLX_nvidia.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libGLX_nvidia.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libcudadebugger.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libcudadebugger.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-allocator.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-allocator.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-eglcore.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-eglcore.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-gtk2.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-gtk2.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-cfg.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-cfg.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-glcore.so.535.104.12","containerPath":"/home/kubernetes/bin/nvidia/lib64/libnvidia-glcore.so.535.104.12","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/bin/nvidia-persistenced","containerPath":"/home/kubernetes/bin/nvidia/bin/nvidia-persistenced","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/bin/nvidia-cuda-mps-control","containerPath":"/home/kubernetes/bin/nvidia/bin/nvidia-cuda-mps-control","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/bin/nvidia-cuda-mps-server","containerPath":"/home/kubernetes/bin/nvidia/bin/nvidia-cuda-mps-server","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/bin/nvidia-smi","containerPath":"/home/kubernetes/bin/nvidia/bin/nvidia-smi","options":["ro","nosuid","nodev","bind"]},{"hostPath":"/home/kubernetes/bin/nvidia/bin/nvidia-debugdump","containerPath":"/home/kubernetes/bin/nvidia/bin/nvidia-debugdump","options":["ro","nosuid","nodev","bind"]}]}}

Questions:

  1. Will it be able to update the PATH to include the desired /home/kubernetes/bin/nvidia PATH in the OCI spec of the GPU container ?
  2. Will configure the device plugin driver_root /home/kubernetes/bin/nvidia/bin, be able to help this path issue ?

Thanks