awslabs / amazon-eks-ami

Packer configuration for building a custom EKS AMI
https://awslabs.github.io/amazon-eks-ami/
MIT No Attribution
2.46k stars 1.15k forks source link

bug(nvidia-container-toolkit): unsupported IMEX channel #2062

Open ugurgural opened 3 days ago

ugurgural commented 3 days ago

Hi, since with the version v20241109, I started to get the error I shared below. Initially thought that my nvidia-device-plugin could be an older version, but even updating to the latest one(0.17.0) didn't solve it. As a workaround, I reverted the karpenter configuration pointing out to the previous version v20241016. Does anyone have an idea what this could be?

Note: configuration is almost barebone with the original eks GPU AMI, I only enable docker service in the user data for DIND scenarios.

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: error parsing IMEX info: unsupported IMEX channel value: all: unknown

Environment:

Issacwww commented 3 days ago

I think this issue is related to https://github.com/NVIDIA/nvidia-container-toolkit/issues/797, we will have another release (the one after v20241115) soon to have the updated nvidia-contianer-toolkit 1.17.2