dell / dkms

Dynamic Kernel Module Support
GNU General Public License v2.0
657 stars 150 forks source link

dkms does not appear to be chroot safe #145

Open johnnzz opened 3 years ago

johnnzz commented 3 years ago

It appears that dkms does not work correctly in a chroot environment. You can see an example of this by simply doing:

smw:/image_roots # chroot my_image_root/ smw:/ # dkms status /usr/sbin/dkms: line 1980: /dev/fd/62: No such file or directory /usr/sbin/dkms: line 1912: /dev/fd/62: No such file or directory smw:/ #

The reason this is important is many HPC cluster management systems build images off-line in a chroot prior to deployment. And we have been encountering packages that use dkms in the rpm post install scriptlets and these packages fail to install in these contexts.

A lot of these systems use zypper under the covers, like:

smw:/ # zypper --root /image_roots/my_image_root install rocm-dkms

It seems the issue is that some of the functions in dkms rely on on the shell mechanism described here:

https://unix.stackexchange.com/questions/156084/why-does-process-substitution-result-in-a-file-called-dev-fd-63-which-is-a-pipe#:~:text=So%20%2Fdev%2Ffd%2F63,output%20of%20your%20ls%20call.

If those functions could be refactored not to use that mechanism, it seems like dkms should work in the zypper --root context, and if that worked, it should work for this whole class of cluster management systems.

gts-cray commented 3 years ago

Here is an example diff for a workaround we used:

flubber-smw:/var/opt/cray/imps/image_roots # diff gsetterhol_NVIDIA-699/usr/sbin/dkms gsetterhol_NVIDIA-699/usr/sbin/dkms.bak
23,24d22
< set -x
< 
1915,1917c1913
<     local mvka m v k a kern status tmp_file
<     tmp_file=$(mktemp)
<     module_status_weak "$@" > $tmp_file
---
>     local mvka m v k a kern status
1921,1922c1917
<     done < $tmp_file
<     rm $tmp_file
---
>     done < <(module_status_weak "$@")
1986,1988c1981
<     local status mvka m v k a tmp_file
<     tmp_file=$(mktemp)
<     module_status "$@" > $tmp_file
---
>     local status mvka m v k a
2001,2002c1994
<     done < $tmp_file
<     rm $tmp_file
---
>     done < <(module_status "$@")
flubber-smw:/var/opt/cray/imps/image_roots # 

This gets status working in a chroot w/o /dev/, /sys/, and so forth. A better solution might be to use local variables (e.g. no temp file), but hopefully this can help to get started.

evelikov commented 1 year ago

@gts-cray can you make that a MR + ideally some tests?

Glancing at the latest code - we have a mktemp_or_die helper, although more importantly it's being used in a number of places - safe_source() to read the config file, around rpm_safe_upgrade, checking status, making a tarball, extracting/importing from tarball, dummy check we can write to /tmp.

So as-is the diff pasted is very incomplete. I'm more concerned that that w/o any tests, we'll end up breaking it all the time.

dmjacobsen commented 1 year ago

It looks like recently in commit 5111f9cfe56cd357d81d7471fd090459bf905f99 additional checks were put in that definitely don't solve this but give a warning instead. @evelikov, the most obvious automated test for this would need to start a container or a chroot environment, requiring some degree of privilege when running the tests (or at least a software stack supporting user namespaces).

An alternative if the bash process substitution ( <( ... ) ) is the only problem would just be a test that validates that process substitution is not present anywhere in the codebase.

evelikov commented 1 year ago

@dmjacobsen thanks for pointing that commit - I already had a look at it while it was proposed. Our test suite is based on Github Actions, which uses docker + containers under the hood.

Seems like you have some ideas already, so if you can put those in terms of PR that would be great.