nvidia-smi mapped into a container as a blank file when using k8s+containerd+nvidia-container-runtime

drtpotter commented 3 years ago

Hi there,

I'm trying to use nvidia-container-runtime with containerd 1.4.4 under Kubernetes 1.21 but /usr/bin/nvidia-smi seems to be mapped into the container as a blank file. I'm using OpenSUSE Tumbleweed.

If I use CRI-O with the nvidia container runtime under k8s my target container can see my RTX 3060. I can run nvidia-smi, so I'm pretty sure my container is set up correctly. Unfortunately other applications don't seem to work with CRI-O so I am trying to use k8s+Containerd instead.

If I switch to Containerd I can run this command fine.

sudo containerd-ctr -a /var/run/docker/containerd/containerd.sock run --rm --gpus 0 docker.io/nvidia/cuda:11.0-base nvidia-smi nvidia-smi

However when using k8s+containerd it looks like /usr/bin/nvidia-smi is mapped through as a blank file inside the container. I must stress that the runtime works fine under k8s+CRI-O so it appears that under k8s+containerd something has gone wrong with making nvidia-smi available inside the container. Here is my /etc/containerd/config.toml and I'm pretty sure I have followed the NVIDIA directions for patching the file to use the nvidia container runtime.

version = 2 root = "/var/lib/docker/containerd/daemon" state = "/var/run/docker/containerd/daemon" plugin_dir = "" disabled_plugins = [] required_plugins = [] oom_score = 0

[grpc] address = "/var/run/docker/containerd/containerd.sock" tcp_address = "" tcp_tls_cert = "" tcp_tls_key = "" uid = 0 gid = 0 max_recv_message_size = 16777216 max_send_message_size = 16777216

[ttrpc] address = "" uid = 0 gid = 0

[debug] address = "" uid = 0 gid = 0 level = ""

[metrics] address = "" grpc_histogram = false

[cgroup] path = ""

[timeouts] "io.containerd.timeout.shim.cleanup" = "5s" "io.containerd.timeout.shim.load" = "5s" "io.containerd.timeout.shim.shutdown" = "3s" "io.containerd.timeout.task.state" = "2s"

[plugins] [plugins."io.containerd.gc.v1.scheduler"] pause_threshold = 0.02 deletion_threshold = 0 mutation_threshold = 100 schedule_delay = "0s" startup_delay = "100ms" [plugins."io.containerd.grpc.v1.cri"] disable_tcp_service = true stream_server_address = "127.0.0.1" stream_server_port = "0" stream_idle_timeout = "4h0m0s" enable_selinux = false selinux_category_range = 1024 sandbox_image = "k8s.gcr.io/pause:3.2" stats_collect_period = 10 systemd_cgroup = false enable_tls_streaming = false max_container_log_line_size = 16384 disable_cgroup = false disable_apparmor = false restrict_oom_score_adj = false max_concurrent_downloads = 3 disable_proc_mount = false unset_seccomp_profile = "" tolerate_missing_hugetlb_controller = true disable_hugetlb_controller = true ignore_image_defined_volumes = false [plugins."io.containerd.grpc.v1.cri".containerd] snapshotter = "overlayfs" default_runtime_name = "runc" no_pivot = false disable_snapshot_annotations = true discard_unpacked_layers = false [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime] runtime_type = "" runtime_engine = "" runtime_root = "" privileged_without_host_devices = false base_runtime_spec = "" [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime] runtime_type = "" runtime_engine = "" runtime_root = "" privileged_without_host_devices = false base_runtime_spec = "" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] runtime_type = "io.containerd.runc.v2" runtime_engine = "" runtime_root = "" privileged_without_host_devices = false base_runtime_spec = "" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v1" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime" SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".cni] bin_dir = "/opt/cni/bin" conf_dir = "/etc/cni/net.d" max_conf_num = 1 conf_template = "" [plugins."io.containerd.grpc.v1.cri".registry] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://registry-1.docker.io"] [plugins."io.containerd.grpc.v1.cri".image_decryption] key_model = "" [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming] tls_cert_file = "" tls_key_file = "" [plugins."io.containerd.internal.v1.opt"] path = "/opt/containerd" [plugins."io.containerd.internal.v1.restart"] interval = "10s" [plugins."io.containerd.metadata.v1.bolt"] content_sharing_policy = "shared" [plugins."io.containerd.monitor.v1.cgroups"] no_prometheus = false [plugins."io.containerd.runtime.v1.linux"] shim = "containerd-shim" runtime = "runc" runtime_root = "" no_shim = false shim_debug = false [plugins."io.containerd.runtime.v2.task"] platforms = ["linux/amd64"] [plugins."io.containerd.service.v1.diff-service"] default = ["walking"] [plugins."io.containerd.snapshotter.v1.devmapper"] root_path = "" pool_name = "" base_image_size = "" async_remove = false

Any help or suggestions as to why /usr/bin/nvidia-smi is mapped through as a blank file in the container would be most appreciated!

Kind regards, Toby

elezar commented 3 years ago

Hi @drtpotter when launching a container on k8s+containerd do you specify a runtime class to ensure that the nvidia-runtime is selected? Note that the --gpus all flag on the containerd-ctr command line works differently to how k8s would run a container using containerd.

Some suggestions:

Try to get the container started using ctr and specifying the nvidia runtime explicitly instead of relying on the --gpus all flag.
Check whether it works as expected when nvidia is set as the default_runtime_name in the containerd config.
Ensure that the podspec for the GPU-enabled pods include a RuntimeClass of nvidia (matching the runtime name in containerd).

drtpotter commented 3 years ago

Hi @elezer, yes changing this line in /etc/containerd/config.toml

default_runtime_name = "runc"

to

default_runtime_name = "nvidia"

fixed the problem. I'd recommend having this change integrated into the containerd section of the nvidia-container-runtime documentation at

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

Thanks for the suggestions, happy to close this issue!

elezar commented 3 years ago

Thanks @drtpotter. I have added a task to update the docs. Glad that we were able to resolve the issue for you.

NVIDIA / nvidia-container-runtime

nvidia-smi mapped into a container as a blank file when using k8s+containerd+nvidia-container-runtime #147