Hi, since with the version v20241109, I started to get the error I shared below. Initially thought that my nvidia-device-plugin could be an older version, but even updating to the latest one(0.17.0) didn't solve it. As a workaround, I reverted the karpenter configuration pointing out to the previous version v20241016. Does anyone have an idea what this could be?
Note: configuration is almost barebone with the original eks GPU AMI, I only enable docker service in the user data for DIND scenarios.
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: error parsing IMEX info: unsupported IMEX channel value: all: unknown
Hi, since with the version v20241109, I started to get the error I shared below. Initially thought that my nvidia-device-plugin could be an older version, but even updating to the latest one(0.17.0) didn't solve it. As a workaround, I reverted the karpenter configuration pointing out to the previous version v20241016. Does anyone have an idea what this could be?
Note: configuration is almost barebone with the original eks GPU AMI, I only enable docker service in the user data for DIND scenarios.
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: error parsing IMEX info: unsupported IMEX channel value: all: unknown
Environment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
): eks.8aws eks describe-cluster --name <name> --query cluster.version
): 1.30uname -a
): Linux ip-xxx-xx-xx-xx.eu-central-1.compute.internal 5.10.227-219.884.amzn2.x86_64 1 SMP Tue Oct 22 16:38:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linuxcat /etc/eks/release
on a node):