NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.21k stars 2.03k forks source link

containerd-config.patch cannot be applied #1781

Closed wanghm closed 1 year ago

wanghm commented 1 year ago

1. Issue or feature description

I'm following the install guide to install nvidia container toolkit. But failed to apply the patch file containerd-config.patch. My environment is: Ubuntu 22.04LTS kubernetes 1.26 Containerd: 1.6.22

Seems the patch file of config.toml in the install guide is too old. Can not apply it to the latest containerd environment. Can you please modify the install guide to provide a new patch file?

2. Steps to reproduce the issue

Generate config.toml

sudo mkdir -p /etc/containerd && sudo containerd config default | sudo tee /etc/containerd/config.toml

create patch file(copy from nvidia install guide)

cat <<EOF > containerd-config.patch
--- config.toml.orig    2020-12-18 18:21:41.884984894 +0000
+++ /etc/containerd/config.toml 2020-12-18 18:23:38.137796223 +0000
@@ -94,6 +94,15 @@
        privileged_without_host_devices = false
        base_runtime_spec = ""
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+            SystemdCgroup = true
+       [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
+          privileged_without_host_devices = false
+          runtime_engine = ""
+          runtime_root = ""
+          runtime_type = "io.containerd.runc.v1"
+          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+            BinaryName = "/usr/bin/nvidia-container-runtime"
+            SystemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".cni]
    bin_dir = "/opt/cni/bin"
    conf_dir = "/etc/cni/net.d"
EOF

Apply the patch

patch config.toml < containerd-config.patch

***Error messsage:** patching file config.toml Hunk #1 FAILED at 94. 1 out of 1 hunk FAILED -- saving rejects to file config.toml.rej

elezar commented 1 year ago

We are in the process of updating our docs to use the nvidia-ctk runtime configure command instead of manually applying the patch.

Please run:

sudo nvidia-ctk runtime configure --runtime=containerd --config=/etc/containerd/config.toml

Adding a --dry-run flag will output the modified config instead of updating the file.

wanghm commented 1 year ago

@elezar Thank you very much. I just tried the above command, but got: sudo nvidia-ctk runtime configure --runtime=containerd --config=/etc/containerd/config.toml ERRO[0000] unrecognized runtime 'containerd'

elezar commented 1 year ago

Which version of nvidia-ctk do you have installed? Support for containerd will be included in the 1.14.0 release (with the rc.2 release already available from our public experimental repos)

wanghm commented 1 year ago

It was 1.13.5. I just changed repository to experimental and installed it again. Now it works!

# nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.14.0-rc.2
commit: e4fee325cbdd31815b7e4796d493ccb58082fa22

# diff -u config.toml_bk config.toml
--- config.toml_bk  2023-08-24 10:02:51.852912445 +0800
+++ config.toml 2023-08-24 10:03:14.192965286 +0800
@@ -99,6 +99,31 @@

       [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

+        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
+          base_runtime_spec = ""
+          cni_conf_dir = ""
+          cni_max_conf_num = 0
+          container_annotations = []
+          pod_annotations = []
+          privileged_without_host_devices = false
+          runtime_engine = ""
+          runtime_path = ""
+          runtime_root = ""
+          runtime_type = "io.containerd.runc.v2"
+
+          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+            BinaryName = "/usr/bin/nvidia-container-runtime"
+            CriuImagePath = ""
+            CriuPath = ""
+            CriuWorkPath = ""
+            IoGid = 0
+            IoUid = 0
+            NoNewKeyring = false
+            NoPivotRoot = false
+            Root = ""
+            ShimCgroup = ""
+            SystemdCgroup = false
+
         [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
           base_runtime_spec = ""
           cni_conf_dir = ""
wanghm commented 1 year ago

@elezar Thanks for your help. Close this issue.

If possible, could you please inform me about the approximate schedule for the GA of nvidia-ctk 1.14? I'll do this in the customer's production environment. Hope it will be a GAed version, not RC.