NVIDIA / nvidia-container-runtime

NVIDIA container runtime
Apache License 2.0
1.11k stars 159 forks source link

Running nvidia-container-runtime with podman is blowing up. #85

Closed rhatdan closed 11 months ago

rhatdan commented 4 years ago
  1. Issue or feature description rootless and rootful podman does not work with the nvidia plugin

  2. Steps to reproduce the issue Install the nvidia plugin, configure it to run with podman execute the podman command and check if the devices is configured correctly.

  3. Information to attach (optional if deemed irrelevant)

    Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info Kernel version from uname -a Fedora 30 and later Any relevant kernel output lines from dmesg Driver information from nvidia-smi -a Docker version from docker version NVIDIA packages version from dpkg -l 'nvidia' or rpm -qa 'nvidia' NVIDIA container library version from nvidia-container-cli -V NVIDIA container library logs (see troubleshooting) Docker command, image and tag used

I am reporting this based on other users complaining. This is what they said.

We discovered that the ubuntu 18.04 machine needed a configuration change to get rootless working with nvidia: "no-cgroups = true" was set in /etc/nvidia-container-runtime/config.toml Unfortunately this config change did not work on Centos 7, but it did change the rootless error to: nvidia-container-cli: initialization error: cuda error: unknown error\\n\\"\""

This config change breaks podman running from root, with the error: Failed to initialize NVML: Unknown Error

Interestingly, root on ubuntu gets the same error even though rootless works.

rhatdan commented 4 years ago

The Podman team would like to work with you guys to get this to work well in both root full and rootless containers if possible. But we need someone to work with.

rhatdan commented 4 years ago

@mheon @baude FYI

zvonkok commented 4 years ago

@sjug FYI

RenaudWasTaken commented 4 years ago

Hello!

@rhatdan do you mind filling the following issue template: https://github.com/NVIDIA/nvidia-docker/blob/master/.github/ISSUE_TEMPLATE.md

Thanks!

nvjmayo commented 4 years ago

I can work with the podman team.

rhatdan commented 4 years ago

@hholst80 FYI

rhatdan commented 4 years ago

https://github.com/containers/libpod/issues/3659

eaepstein commented 4 years ago

@nvjmayo Thanks for the suggestions. Some good news and less good.

This works rootless: podman run --rm --hooks-dir /usr/share/containers/oci/hooks.d nvcr.io/nvidia/cuda nvidia-smi The same command continues to fail with the image: docker.io/nvidia/cuda

In fact rootless works with or without /usr/share/containers/oci/hooks.d/01-nvhook.json installed using the image: nvcr.io/nvidia/cuda

Running as root continues to fail when no-cgroups = true for either container, returning: Failed to initialize NVML: Unknown Error

rhatdan commented 4 years ago

Strange I would not expect podman to run a hook that did not have a json file describing the hook.

nvjmayo commented 4 years ago

@eaepstein I'm still struggling to reproduce the issue you see. Using docker.io/nvidia/cuda also works for me with the hooks dir.

$ podman run --rm --hooks-dir /usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda nvidia-smi
Tue Oct 22 21:35:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:65:00.0 N/A |                  N/A |
| 50%   38C    P0    N/A /  N/A |      0MiB /  2001MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

without the hook I would expect to see a failure roughly like:

Error: time="2019-10-22T14:35:14-07:00" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"nvidia-smi\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": OCI runtime command not found error

This is because the libraries and tools get installed by the hook in order to match the host drivers. (an unfortunate limitation of tightly coupled driver+library releases)

I think there is a configuration issue and not an issue of the container image (docker.io/nvidia/cuda vs nvcr.io/nvidia/cuda).

Reviewing my earlier posts, I recommend changing my 01-nvhook.json and remove the NVIDIA_REQUIRE_CUDA=cuda>=10.1 from it. My assumption is everyone has the latest CUDA install, which was kind of a silly assumption on my part. The CUDA version doesn't have to be specified, and you can leave this environment variable out of your set up. It was an artifact of my earlier experiments.

eaepstein commented 4 years ago

@nvjmayo we started from scratch with a new machine (CentOS Linux release 7.7.1908) and both docker.io and nvcr.io images are working for us now too. And --hooks-dir must now be specified for both to work. Thanks for the help!

eaepstein commented 4 years ago

@rhatdan @nvjmayo Turns out that getting rootless podman working with nvidia on centos 7 is a bit more complicated, at least for us.

Here is our scenario on brand new centos 7.7 machine

  1. run nvidia-smi with rootless podman 1.result: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\n\\"\"

  2. run podman with user=root 2.result: nvidia-smi works

  3. run podman rootless 3.result: nvidia-smi works!

  4. reboot machine, run podman rootless 4.result: fails again with same error as NVIDIA/nvidia-docker#1

Conclusion: running nvidia container with podman as root changes the environment for rootless to work. Environment cleared on reboot.

One other comment: podman as root and rootless podman cannot run with the same /etc/nvidia-container-runtime/config.toml - no-cgroups must =false for root and =true for rootless

rhatdan commented 4 years ago

If the nvidia hook is doing any privileged operations like modifying /dev and adding devicenodes, then this will not work with rootless. (In rootless all processes are running with the Users UID. Probably when you run rootfull, it is doing the privileged operations, so the next time you run rootless, those activities do not need to be done.

I would suggest for rootless systems, that the /dev and nvidia ops be done as a systemd unit file, so the system is preconfigured and then the rootless jobs will work fine.

eaepstein commented 4 years ago

After running nvidia/cuda with rootfull podman, the following exist: crw-rw-rw-. 1 root root 195, 254 Oct 25 09:11 nvidia-modeset crw-rw-rw-. 1 root root 195, 255 Oct 25 09:11 nvidiactl crw-rw-rw-. 1 root root 195, 0 Oct 25 09:11 nvidia0 crw-rw-rw-. 1 root root 241, 1 Oct 25 09:11 nvidia-uvm-tools crw-rw-rw-. 1 root root 241, 0 Oct 25 09:11 nvidia-uvm

None of these devices exist after boot. Running nvidia-smi rootless (no podman) creates: crw-rw-rw-. 1 root root 195, 0 Oct 25 13:40 nvidia0 crw-rw-rw-. 1 root root 195, 255 Oct 25 13:40 nvidiactl

I created the other three entries using "sudo mknod -m 666 etc..." but that is not enough to run rootless. Something else is needed in the environment.

Running nvidia/cuda with rootfull podman at boot would work, but not pretty.

Thanks for the suggestion

flx42 commented 4 years ago

This behavior is documented in our installation guide: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications

From a userns you can't mknod or use nvidia-modprobe. But, if this binary is present and if it can be called in a context where setuid works, it's an option.

There is already nvidia-persistenced as a systemd unit file, but it won't load the nvidia_uvm kernel modules nor create the device files, IIRC.

Another option is to use udev rules, which is what Ubuntu is doing:

$ cat /lib/udev/rules.d/71-nvidia.rules 
[...]

# Load and unload nvidia-uvm module
ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-uvm"
ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-uvm"

# This will create the device nvidia device nodes
ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/usr/bin/nvidia-smi"

# Create the device node for the nvidia-uvm module
ACTION=="add", DEVPATH=="/module/nvidia_uvm", SUBSYSTEM=="module", RUN+="/sbin/create-uvm-dev-node"
rhatdan commented 4 years ago

Udev rules makes sense to me.

eaepstein commented 4 years ago

@flx42 sudo'ing the setup script in "4.5. Device Node Verification" is the only thing needed to get rootless nvidia/cuda containers running for us. It created the following devices: crw-rw-rw-.  1 root root    195,   0 Oct 27 20:38 nvidia0 crw-rw-rw-.  1 root root    195, 255 Oct 27 20:38 nvidiactl crw-rw-rw-.  1 root root    241,   0 Oct 27 20:38 nvidia-uvm

The udev file only created the first two and was not sufficient by itself. We'll go with a unit file for the setup script.

Many thanks for your help.

qhaas commented 4 years ago

Thanks guys, with insight from this issue and others, I was able to get podman working with my Quadro in EL7 using sudo podman run --privileged --rm --hooks-dir /usr/share/containers/oci/hooks.d docker.io/nvidia/cudagl:10.1-runtime-centos7 nvidia-smi after installing the 'nvidia-container-toolkit' package.

Once the dust settles on how to get GPU support in rootless podman in EL7, a step-by-step guide would make for a great blog post and/or entry into the podman and/or nvidia documentation.

dagrayvid commented 4 years ago

Hello @nvjmayo and @rhatdan. I'm wondering if there is an update on this issue or this one for how to access NVIDIA GPU's from containers run rootless with podman.

On RHEL8.1, with default /etc/nvidia-container-runtime/config.toml, and running containers with root, GPU access works as expected. Rootless does not work by default, it fails with cgroup related errors (as expected).

After modifying the config.toml file -- setting no-cgroups = true and changing the debug log file -- rootless works. However, these changes make GPU access fail in containers run as root, with error "Failed to initialize NVML: Unknown Error."

Please let me know if there is any recent documentation on how to do this beyond these two issues.

jamescassell commented 4 years ago

Steps to get it working on RHEL 8.1:

  1. Install Nvidia Drivers, make sure nvidia-smi works on the host
  2. Install nvidia-container-toolkit from repos at
    baseurl=https://nvidia.github.io/libnvidia-container/centos7/$basearch
    baseurl=https://nvidia.github.io/nvidia-container-runtime/centos7/$basearch
  3. Modify /etc/nvidia-container-runtime/config.toml and change these values:
    [nvidia-container-cli]
    #no-cgroups = false
    no-cgroups = true
    [nvidia-container-runtime]
    #debug = "/var/log/nvidia-container-runtime.log"
    debug = "~/./local/nvidia-container-runtime.log"
  4. run it rootless as podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda:10.2-devel-ubi8 /usr/bin/nvidia-smi
zvonkok commented 4 years ago

/cc @dagrayvid

dagrayvid commented 4 years ago

Thanks @jamescassell.

I repeated those steps on RHEL8.1, and nvidia-smi works as expected when running rootless. However, once those changes are made, I am unable to run nvidia-smi in a container run as root. Is this behaviour expected, or is there some change in CLI flags needed when running as root? Running as root did work before making these changes.

Is there a way to configure a system so that we can utilize GPUs with podman as root and non-root user?

andrewssobral commented 4 years ago

I can't run podman rootless with GPU, someone can help me?

docker run --runtime=nvidia --privileged nvidia/cuda nvidia-smi works fine but podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi crashes, same for sudo podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi

Output:

$ podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi
2020/04/03 13:34:52 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
Error: `/usr/bin/nvidia-container-runtime start e3ccb660bf27ce0858ee56476e58b53cd3dc900e8de80f08d10f3f844c0e9f9a` failed: exit status 1

But, runc exists:

$ whereis runc
runc: /usr/bin/runc
$ whereis docker-runc
docker-runc:
$ podman --version
podman version 1.8.2
$ cat ~/.config/containers/libpod.conf
# libpod.conf is the default configuration file for all tools using libpod to
# manage containers

# Default transport method for pulling and pushing for images
image_default_transport = "docker://"

# Paths to look for the conmon container manager binary.
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
conmon_path = [
            "/usr/libexec/podman/conmon",
            "/usr/local/libexec/podman/conmon",
            "/usr/local/lib/podman/conmon",
            "/usr/bin/conmon",
            "/usr/sbin/conmon",
            "/usr/local/bin/conmon",
            "/usr/local/sbin/conmon",
            "/run/current-system/sw/bin/conmon",
]

# Environment variables to pass into conmon
conmon_env_vars = [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]

# CGroup Manager - valid values are "systemd" and "cgroupfs"
#cgroup_manager = "systemd"

# Container init binary
#init_path = "/usr/libexec/podman/catatonit"

# Directory for persistent libpod files (database, etc)
# By default, this will be configured relative to where containers/storage
# stores containers
# Uncomment to change location from this default
#static_dir = "/var/lib/containers/storage/libpod"

# Directory for temporary files. Must be tmpfs (wiped after reboot)
#tmp_dir = "/var/run/libpod"
tmp_dir = "/run/user/1000/libpod/tmp"

# Maximum size of log files (in bytes)
# -1 is unlimited
max_log_size = -1

# Whether to use chroot instead of pivot_root in the runtime
no_pivot_root = false

# Directory containing CNI plugin configuration files
cni_config_dir = "/etc/cni/net.d/"

# Directories where the CNI plugin binaries may be located
cni_plugin_dir = [
               "/usr/libexec/cni",
               "/usr/lib/cni",
               "/usr/local/lib/cni",
               "/opt/cni/bin"
]

# Default CNI network for libpod.
# If multiple CNI network configs are present, libpod will use the network with
# the name given here for containers unless explicitly overridden.
# The default here is set to the name we set in the
# 87-podman-bridge.conflist included in the repository.
# Not setting this, or setting it to the empty string, will use normal CNI
# precedence rules for selecting between multiple networks.
cni_default_network = "podman"

# Default libpod namespace
# If libpod is joined to a namespace, it will see only containers and pods
# that were created in the same namespace, and will create new containers and
# pods in that namespace.
# The default namespace is "", which corresponds to no namespace. When no
# namespace is set, all containers and pods are visible.
#namespace = ""

# Default infra (pause) image name for pod infra containers
infra_image = "k8s.gcr.io/pause:3.1"

# Default command to run the infra container
infra_command = "/pause"

# Determines whether libpod will reserve ports on the host when they are
# forwarded to containers. When enabled, when ports are forwarded to containers,
# they are held open by conmon as long as the container is running, ensuring that
# they cannot be reused by other programs on the host. However, this can cause
# significant memory usage if a container has many ports forwarded to it.
# Disabling this can save memory.
#enable_port_reservation = true

# Default libpod support for container labeling
# label=true

# The locking mechanism to use
lock_type = "shm"

# Number of locks available for containers and pods.
# If this is changed, a lock renumber must be performed (e.g. with the
# 'podman system renumber' command).
num_locks = 2048

# Directory for libpod named volumes.
# By default, this will be configured relative to where containers/storage
# stores containers.
# Uncomment to change location from this default.
#volume_path = "/var/lib/containers/storage/volumes"

# Selects which logging mechanism to use for Podman events.  Valid values
# are `journald` or `file`.
# events_logger = "journald"

# Specify the keys sequence used to detach a container.
# Format is a single character [a-Z] or a comma separated sequence of
# `ctrl-<value>`, where `<value>` is one of:
# `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_`
#
# detach_keys = "ctrl-p,ctrl-q"

# Default OCI runtime
runtime = "runc"

# List of the OCI runtimes that support --format=json.  When json is supported
# libpod will use it for reporting nicer errors.
runtime_supports_json = ["crun", "runc"]

# List of all the OCI runtimes that support --cgroup-manager=disable to disable
# creation of CGroups for containers.
runtime_supports_nocgroups = ["crun"]

# Paths to look for a valid OCI runtime (runc, runv, etc)
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
[runtimes]
runc = [
            "/usr/bin/runc",
            "/usr/sbin/runc",
            "/usr/local/bin/runc",
            "/usr/local/sbin/runc",
            "/sbin/runc",
            "/bin/runc",
            "/usr/lib/cri-o-runc/sbin/runc",
            "/run/current-system/sw/bin/runc",
]

crun = [
                "/usr/bin/crun",
                "/usr/sbin/crun",
                "/usr/local/bin/crun",
                "/usr/local/sbin/crun",
                "/sbin/crun",
                "/bin/crun",
                "/run/current-system/sw/bin/crun",
]

nvidia = ["/usr/bin/nvidia-container-runtime"]

# Kata Containers is an OCI runtime, where containers are run inside lightweight
# Virtual Machines (VMs). Kata provides additional isolation towards the host,
# minimizing the host attack surface and mitigating the consequences of
# containers breakout.
# Please notes that Kata does not support rootless podman yet, but we can leave
# the paths below blank to let them be discovered by the $PATH environment
# variable.

# Kata Containers with the default configured VMM
kata-runtime = [
    "/usr/bin/kata-runtime",
]

# Kata Containers with the QEMU VMM
kata-qemu = [
    "/usr/bin/kata-qemu",
]

# Kata Containers with the Firecracker VMM
kata-fc = [
    "/usr/bin/kata-fc",
]

# The [runtimes] table MUST be the last thing in this file.
# (Unless another table is added)
# TOML does not provide a way to end a table other than a further table being
# defined, so every key hereafter will be part of [runtimes] and not the main
# config.
$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
debug = "/tmp/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
debug = "/tmp/nvidia-container-runtime.log
$ cat /tmp/nvidia-container-runtime.log
2020/04/03 13:23:02 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:02 Using bundle file: /home/andrews/.local/share/containers/storage/vfs-containers/614cb26f8f4719e3aba56be2e1a6dc29cd91ae760d9fe3bf83d6d1b24becc638/userdata/config.json
2020/04/03 13:23:02 prestart hook path: /usr/bin/nvidia-container-runtime-hook
2020/04/03 13:23:02 Prestart hook added, executing runc
2020/04/03 13:23:02 Looking for "docker-runc" binary
2020/04/03 13:23:02 "docker-runc" binary not found
2020/04/03 13:23:02 Looking for "runc" binary
2020/04/03 13:23:02 Runc path: /usr/bin/runc
2020/04/03 13:23:09 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:09 Command is not "create", executing runc doing nothing
2020/04/03 13:23:09 Looking for "docker-runc" binary
2020/04/03 13:23:09 "docker-runc" binary not found
2020/04/03 13:23:09 Looking for "runc" binary
2020/04/03 13:23:09 ERROR: find runc path: exec: "runc": executable file not found in $PATH
2020/04/03 13:31:06 Running nvidia-container-runtime
2020/04/03 13:31:06 Command is not "create", executing runc doing nothing
2020/04/03 13:31:06 Looking for "docker-runc" binary
2020/04/03 13:31:06 "docker-runc" binary not found
2020/04/03 13:31:06 Looking for "runc" binary
2020/04/03 13:31:06 Runc path: /usr/bin/runc
$ nvidia-container-runtime --version
runc version 1.0.0-rc8
commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
spec: 1.0.1-dev
NVRM version:   440.64.00
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          GeForce RTX 2070
Brand:          GeForce
GPU UUID:       GPU-22dfd02e-a668-a6a6-a90a-39d6efe475ee
Bus Location:   00000000:01:00.0
Architecture:   7.5
$ docker version
Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        2d0083d
 Built:             Thu Jun 27 17:56:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
jamescassell commented 4 years ago

See particularly step 4. https://github.com/NVIDIA/nvidia-container-runtime/issues/85#issuecomment-604931556

rhatdan commented 4 years ago

This looks like the nvidia plugin is searching for a hard coded path to runc?

andrewssobral commented 4 years ago

[updated] Hi @jamescassell , unfortunately do not work for me. (same error using sudo)

$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ --runtime=nvidia nvidia/cudanvidia-smi
2020/04/03 17:33:06 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
2020/04/03 17:33:06 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
Error: `/usr/bin/nvidia-container-runtime start 060398d97299ee033e8ebd698a11c128bd80ce641dd389976ca43a34b26abab3` failed: exit status 1
jamescassell commented 4 years ago

Hi @jamescassell , unfortunately do not work for me.

$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda nvidia-smi
Error: container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": OCI runtime command not found error

Did you make the other changes described? I'd hit the same error until making the config changes.

andrewssobral commented 4 years ago

@jamescassell yes, see https://github.com/NVIDIA/nvidia-container-runtime/issues/85#issuecomment-608469598

jamescassell commented 4 years ago

Not sure if it's relevant but looks like you're missing a quote: debug = "/tmp/nvidia-container-runtime.log

andrewssobral commented 4 years ago

@jamescassell $ sudo nano /etc/nvidia-container-runtime/config.toml

rhatdan commented 4 years ago

I think this is a podman issue. Podman is not passing $PATH down to conmon when it executes it.
https://github.com/containers/libpod/pull/5712 I am not sure if conmon then passes the PATH environment down to the OCI runtime either.

andrewssobral commented 4 years ago

@rhatdan yes , I will check this PR https://github.com/containers/libpod/pull/5712 Thanks

coreyryanhanson commented 4 years ago

I had a major issue with this error message popping up when trying to change my container user id while adding the hook that was made to fix the rootless problem.

Error: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": OCI runtime error But I've since learned that this particular behavior is quite quirky and where I thought I pinpointed it, it now seems to work, if there is a call to the container using sudo (the container wouldn't work but the subsequent command did). Eagerly awaiting an update where root (no pun intended) of this nvidia container problem gets addressed.

andrewssobral commented 4 years ago

Hi @rhatdan , answering your previous question https://github.com/containers/libpod/pull/5712#issuecomment-608516075 I was able to install the new version of podman, and it works fine with my GPU, however, I am getting this strange behavior at the end of the execution, please see:

andrews@deeplearning:~/Projects$ podman run -it --rm --runtime=nvidia --privileged nvidia/cuda:10.0-cudnn7-runtime nvidia-smi 
Mon May 18 21:30:17 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0  On |                  N/A |
| 37%   30C    P8     9W / 175W |    166MiB /  7979MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
2020/05/18 23:30:18 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
ERRO[0003] Error removing container 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65: error removing container 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65 from runtime: `/usr/bin/nvidia-container-runtime delete --force 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65` failed: exit status 1 
andrews@deeplearning:~$ podman --version
podman version 1.9.2
andrews@deeplearning:~$ cat /tmp/nvidia-container-runtime.log
2020/05/18 23:47:47 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:47 Using bundle file: /home/andrews/.local/share/containers/storage/vfs-containers/3add1cc2bcb9cecde045877d9a0e4d3ed9f64d304cd5cb07fd0e072bf163a170/userdata/config.json
2020/05/18 23:47:47 prestart hook path: /usr/bin/nvidia-container-runtime-hook
2020/05/18 23:47:47 Prestart hook added, executing runc
2020/05/18 23:47:47 Looking for "docker-runc" binary
2020/05/18 23:47:47 Runc path: /usr/bin/docker-runc
2020/05/18 23:47:48 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:48 Command is not "create", executing runc doing nothing
2020/05/18 23:47:48 Looking for "docker-runc" binary
2020/05/18 23:47:48 Runc path: /usr/bin/docker-runc
2020/05/18 23:47:48 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:48 Command is not "create", executing runc doing nothing
2020/05/18 23:47:48 Looking for "docker-runc" binary
2020/05/18 23:47:48 "docker-runc" binary not found
2020/05/18 23:47:48 Looking for "runc" binary
2020/05/18 23:47:48 ERROR: find runc path: exec: "runc": executable file not found in $PATH
andrews@deeplearning:~$ nvidia-container-runtime --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev
andrews@deeplearning:~$ whereis runc
runc: /usr/bin/runc
andrews@deeplearning:~$ whereis docker-runc
docker-runc: /usr/bin/docker-runc

do you know what it can be?

rhatdan commented 4 years ago

The error you are getting looks like the $PATH was not being passed into you OCI Runtime?

andrewssobral commented 4 years ago

Yes, it's strange...

qhaas commented 4 years ago
  1. Modify /etc/nvidia-container-runtime/config.toml and change these values: ...
  2. run it rootless as podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda:10.2-devel-ubi8 /usr/bin/nvidia-smi

This did the trick for me, thanks. I'm pondering the user/process isolation ramifications of these changes on a multi-user system. Hopefully, RH/NVDA can get this as elegant as Docker's --gpus=all without significantly degrading the security benefits of rootless podman over docker...

rhatdan commented 4 years ago

If you leave the SELinux enabled, what AVC's are you seeing?

Davidnet commented 4 years ago

Amazing work! I was able to get to run GPU enabled containers on Fedora 32 using centos8 repos, and only modifying the /etc/nvidia-container-runtime/config.toml changing no-cgroups = true. I was wondering what are the implications of not using the hooks-dir ?

Thanks

image

Update: Checking a tensorflow image, works flawlessly:

image

Podman rootless with version 1.9.3

zeroepoch commented 4 years ago

For anyone who is looking to have rootless "nvidia-docker" be more or less seamless with podman I would suggest the following changes:

$ cat ~/.config/containers/libpod.conf 
hooks_dir = ["/usr/share/containers/oci/hooks.d", "/etc/containers/oci/hooks.d"]
label = false
$ grep no-cgroups /etc/nvidia-container-runtime/config.toml 
no-cgroups = true

After the above changes on Fedora 32 I can run nvidia-smi using just:

$ podman run -it --rm nvidia/cuda:10.2-base nvidia-smi
Fri Jun 26 22:49:50 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:08:00.0  On |                  N/A |
| 41%   35C    P8     5W / 280W |    599MiB / 24186MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The only annoyance is needing to edit /etc/nvidia-container-runtime/config.toml whenever there is a package update for nvidia-container-toolkit, which fortunately doesn't happen too often. If there was someway to make changes to config.toml persistent across updates or an user config file (without using some hack like chattr +i) then this process would be really smooth.

Maybe in the future a more targeted approach for disabling SELinux will come along that is more secure than just disabling labeling completely for lazy people like myself. I only run a few GPU-based containers here and there so I'm personally not too concerned.

mjlbach commented 4 years ago

@zeroepoch You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh

invexed commented 4 years ago

The instructions here worked for me on Fedora 32, however the problem reappears if I specify --userns keep-id:

Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error

Is that expected behaviour?

Davidnet commented 4 years ago

The instructions here worked for me on Fedora 32, however the problem reappears if I specify --userns keep-id:

Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error

Is that expected behaviour?

Make sure you have modified the file at: /etc/nvidia-container-runtime/config.toml

Everytime that the nvidia-container is updated it will reset the default values and you should change the values of:

#no-cgroups=false
no-cgroups = true
mjlbach commented 4 years ago

@Davidnet Even after the above modification, I am able to reproduce @invexed's error if I try to run the cuda-11 containers. Note the latest tag currently points to cuda 11.

$ podman run --rm --security-opt=label=disable nvidia/cuda:11.0-base-rc /usr/bin/nvidia-smi
Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime erro

But not when trying to run a cuda 10.2 container or lower

$ podman run --rm --security-opt=label=disable nvidia/cuda:10.2-base /usr/bin/nvidia-smi
Sun Jul 12 15:57:40 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   60C    P0    37W / 230W |    399MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
invexed commented 4 years ago

Make sure you have modified the file at: /etc/nvidia-container-runtime/config.toml

Thanks for the reply. I have indeed modified this file. The container runs with podman run --rm --security-opt label=disable -u 0:0 container, but podman run --rm --security-opt label=disable --userns keep-id -u $(id -u):$(id -g) container results in the above error.

EDIT: I have CUDA 10.2 installed:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   33C    P8    N/A /  N/A |     42MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1565      G   /usr/libexec/Xorg                             20MiB |
|    0      2013      G   /usr/libexec/Xorg                             20MiB |
+-----------------------------------------------------------------------------+
zeroepoch commented 4 years ago

EDIT: I have CUDA 10.2 installed:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   33C    P8    N/A /  N/A |     42MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1565      G   /usr/libexec/Xorg                             20MiB |
|    0      2013      G   /usr/libexec/Xorg                             20MiB |
+-----------------------------------------------------------------------------+

You need a 450 driver to run CUDA 11.0 containers. The host CUDA version (or even none at all) doesn't matter, but the driver version does when running a CUDA container. nvidia-docker makes this error more obvious compared to podman. After updating your driver you should be able to run the container.

invexed commented 4 years ago

You need a 450 driver to run CUDA 11.0 containers. The host CUDA version (or even none at all) doesn't matter, but the driver version does when running a CUDA container. nvidia-docker makes this error more obvious compared to podman. After updating your driver you should be able to run the container.

Apologies for the confusion, but I'm actually trying to run a CUDA 10.0.130 container. Updating the driver may fix @mjlbach's problem though.

To be more precise, I'm installing CUDA via https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux within an image based on archlinux.

podman run --rm --security-opt label=disable -u $(id -u):$(id -g) --userns keep-id container

triggers Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error, but

podman run --rm --security-opt label=disable -u 0:0 container

does not. The problem seems to be related to the specification of --userns keep-id.

qhaas commented 4 years ago

You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh

Interesting, per the link in that script to the DGX project, looks like nVidia has already solved SELinux woes on EL7 with nvidia-container. There are plenty of warnings in that project about how it has only been tested on DGX running EL7, would be great if nVidia made this policy available for general use with EL7/EL8 and bundled it inside the nvidia-container-runtime package(s).

That should allow us to use rootless podman with GPU acceleration without --security-opt label=disable, but I don't know the security implications of said policy...

UPDATE: Requested that the DGX selinux update be made part of this package in Issue NVIDIA/nvidia-docker#121

oblitum commented 4 years ago

Hi. Folks, I've hit this same wall as other person: https://github.com/NVIDIA/nvidia-container-toolkit/issues/182. Any idea why that would happen?

zeroepoch commented 3 years ago

@zeroepoch You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh

I finally got around to trying this SELinux module and it worked. I need to add --security-opt label=type:nvidia_container_t still, but that should be more secure than disabling labels. What prompted this attempt to try again was that libpod.conf was deprecated and I was converting my settings to ~/.config/containers/containers.conf. I don't need anything in there anymore with this additional option. Now I just need to figure out how to make it default since I pretty much just run nvidia GPU containers.

For anyone who wants to disable labels still to make the CLI simpler, here are the contents of containers.conf above:

[containers]
label = false