NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
815 stars 199 forks source link

rootless podman + ignore_chown_errors = setgroups() fails causing cli to fail #104

Open plopresti opened 4 years ago

plopresti commented 4 years ago

Related: https://github.com/NVIDIA/nvidia-container-runtime/issues/85

My libnvidia-container version is 1.2.0.

I am using rootless podman on RHEL (CentOS) 7.8, trying out the new-ish ignore_chown_errors option. This mode maps all users inside the container to my own user id, avoiding the nuisances of UID maps (newuidmap, /etc/subuid, etc.)

I followed all of the recommendations at the "related" link above; specifically, I edited config.toml to set no-cgroups to true and the debug path to something in my home directory.

But I get the following error:

$ podman  run -it --rm nvidia/cuda nvidia-smi
Error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": OCI runtime error

Running under strace -f is informative:

$ grep 3747 strace.out
3742  clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f00d8218a10) = 3747
3747  prctl(PR_SET_NAME, "nvc:[driver]" <unfinished ...>
...
3747  setgroups(1, [65534] <unfinished ...>
3747  <... setgroups resumed>)          = -1 EPERM (Operation not permitted)
3747  exit_group(1 <unfinished ...>
3747  <... exit_group resumed>)         = ?
3747  +++ exited with 1 +++
3742  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3747, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
3742  wait4(3747, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], WNOHANG, NULL) = 3747

Note that PID 3742 is the main nvidia-container-cli process and PID 3747 is the "driver" sub-process. The driver sub-process is trying to call setgroups(), which is failing with EPERM, causing the sub-process to exit and the main process to exit with an error.

The only call to setgroups() in the source code is here:

https://github.com/NVIDIA/libnvidia-container/blob/e6e1c4860d9694608217737c31fc844ef8b9dfd7/src/utils.c#L918

...which is in perm_drop_privileges().

So I commented out the body of perm_drop_privileges(), replaced it with "return 0", recompiled, and installed the hacked libnvidia-container.so.1. And now it works!

$ podman run --rm nvidia/cuda nvidia-smi
Wed Aug 26 22:11:35 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100S-PCI...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   38C    P0    38W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100S-PCI...  Off  | 00000000:D8:00.0 Off |                    0 |
| N/A   39C    P0    36W / 250W |      0MiB / 32510MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Obviously this is not the right fix, and I do not know enough to say what the right fix is. But when I am running rootless podman with all container UIDs mapped to myself, I actually want the container's processes to retain all of my privileges on the host. Perhaps an option in config.toml to skip perm_drop_privileges (?)

klueska commented 4 years ago

Thanks for taking the time to thoroughly debug the issue. I will need to think about the proper fix for this, but your description makes it clear what the issue is at least.

plopresti commented 3 years ago

@klueska Any thoughts on this? I can take a crack at a fix if you can describe what the proper fix is...

klueska commented 3 years ago

Sorry for the delayed response. And thanks for the detailed description / debugging of the issue.

I'll need to dig into this a bit to see what the right fix is. In the meantime, are you OK running on your hacked version, or do you need something more stable / official?

plopresti commented 3 years ago

I wound up using Singularity instead, since it already solves all of the problems I was trying to solve with rootless Podman.

Thanks!

klueska commented 3 years ago

For future reference, it looks like a workaround for podman is described here: https://github.com/containers/podman/issues/3659

plopresti commented 3 years ago

Yeah, I saw that. But this problem is specific to rootless podman with the "ignore_chown_errors" option enabled (which maps everything to UID 0 inside the container).

klueska commented 3 years ago

Got it. Thanks for clarifying.