NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
843 stars 205 forks source link

nvidia-container-cli mount error (operation not permitted) #147

Closed mjg0 closed 2 years ago

mjg0 commented 3 years ago

When I run enroot with a container that uses GPUs on RHEL 7.9, a failure to mount some firmware derails enroot:

$ enroot start pccl+containertest+0.1
nvidia-container-cli: mount error: mount operation failed: /tmp/enroot/pccl+containertest+0.1/usr/lib/firmware/nvidia/470.57.02: operation not permitted
[ERROR] /apps/enroot/3.4.0/gcc-11.2.0/etc/enroot/hooks.d/98-nvidia.sh exited with return code 1

The container in question is docker://pccl/containertest:0.1, but I wouldn't try downloading it unless you have a lot of storage.

I'm not sure if this is an issue with enroot or with nvidia-container-cli, but I figured I'd post here first to get some context.

3XX0 commented 3 years ago

Moved to libnvidia-container, since this looks like an issue with the new firmware path.

@mjg0 you can try adding a strace in front of the CLI in the 98-nvidia.sh hook to see where it is coming from.

elezar commented 2 years ago

Hi @mjg0 / @3XX0. In order to support devices using GSP firmware (e.g. A100 80GB devices), we mount the firmware from the host into the container.

As @3XX0 mentions, the strace output would be useful as well as some more information on the system you are seeing this on.

mjg0 commented 2 years ago

Our cluster has two types of GPU nodes: one type with 2 K80s and 2 Haswell Xeon E5-2670s, and another type with 4 P100s and 2 Broadwell Xeon E5-2680s. The OS (including /usr/lib/firmware) for all nodes is NFS-mounted, and the extracted container is in /tmp which is the node-local disk. The failure is identical on both types of nodes, and even on non-GPU nodes with different CPUs.

Both enroot and libnvidia-container were installed from source using GCC 11.2.0:

# enroot build
git clone --recurse-submodules https://github.com/NVIDIA/enroot.git
cd enroot
git checkout tags/v3.4.0
DESTDIR=/apps/enroot/3.4.0/gcc-11.2.0 make -j install prefix=
# I had to comment out the `/etc/hostname` line in the fstab config file since our OS has no /etc/hostname

# libnvidia-container build
# download release from Github, extract, then:
DESTDIR=/apps/libnvidia-container/1.5.1/gcc-11.2.0 make -j install prefix=
# the RUNPATH was `$ORIGIN/../$LIB`, I had to change it so it could find libnvidia-container.so.1
patchelf --set-rpath '$ORIGIN/../lib' nvidia-container-cli

Here is the log resulting from adding strace -o enroot-strace.log after the exec on the last line of 98-nvidia.sh: enroot-strace.log.

3XX0 commented 2 years ago

mount(NULL, "/tmp/enroot/pccl+containertest+0.1/usr/lib/firmware/nvidia/470.57.02", NULL, MS_NOSUID|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) = -1 EPERM (Operation not permitted)

Looks like a bug in the mount code, it tries to unconditionally set nosuid, noexec, instead the mount needs to account for the filesystem flags. You can probably downgrade libnvidia-container (before the inclusion of the firmware directory) as a workaround for now.

mjg0 commented 2 years ago

I downgraded libnvidia-container to 1.4.0 and it worked--thank you for the insight! @elezar if you know what code needs to be changed I can test a patch, or try to fix it and submit a pull request if you point me in the right direction.

elezar commented 2 years ago

@mjg0 thanks confirming that downgrading works. I will share some links to code locations if you're still up for getting something working on your end.

For the time being, could you provide more information on the properties of the /lib/firmware/nvidia/470.57.02 folder on your system (I don't have ready access to a system that uses GSP firmware).

mjg0 commented 2 years ago

The only file in that directory is gsp.bin. It's an ELF, but without execute permissions.

I'm certainly good to look around a bit--where do you think I should start?

elezar commented 2 years ago

Thanks @mjg0. The mount that is failing would be the one here which is called for the firmware directory here. The firmware directory is currently the only element of info->dirs at the call site.

A "quick and dirty" approach to get this fixed on your end would be to create another mount_firmware_directory function that has the correct mount properties and call this instead, something that queries the filesystem flags and sets these could then be added as a follow-up.

elezar commented 2 years ago

@mjg0 we have a merge request out where we are testing a fix for this. If you get time to test things with these changes applied on your end that would be useful.

klueska commented 2 years ago

FYI: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/merge_requests/124#note_820054894

elezar commented 2 years ago

libnvidia-container-1.8.0-rc.2 is now live with a fix for this behaviour.

Please see #111 (comment) for instructions on how to get access to this RC (or wait for the full release).

klueska commented 2 years ago

libnvidia-container-1.8.0 with a fix for this is now GA

Release notes here: https://github.com/NVIDIA/libnvidia-container/releases/tag/v1.8.0