NERSC / shifter

Shifter - Linux Containers for HPC
Other
348 stars 65 forks source link

Allow the use of unshare for unprivileged user namespaces #317

Open DrDaveD opened 2 years ago

DrDaveD commented 2 years ago

Currently if you try to allocate an unprivileged user namespace inside of shifter, even if the host allows it (e.g. on Perlmutter), an error is returned from the unshare command:

"unshare: unshare failed: Operation not permitted"

Can that be changed to be allowed? With that we should be able to run cvmfsexec and unprivileged singularity under shifter.

scanon commented 2 years ago

@DrDaveD I don't think this is anything we are explicitly doing in Shifter. I think the kernel doesn't allow it for some reason. But I'm not 100% certain.

DrDaveD commented 2 years ago

Normally the kernel allows nesting of unprivileged namespaces.

On the host, can you run unshare -rm and inside that unshare -rm again? Can you run unshare -rm directly inside of shifter?

Docker deliberately blocks the unshare() system call (along with others) using seccomp by default. Shifter doesn't do something like that? There I need to pass the startup option --security-opt seccomp=unconfined to turn that off.

scanon commented 2 years ago

I'm not certain but I think this may be the reason...

From the unshare manpage...

   EPERM (since Linux 3.9)
          CLONE_NEWUSER was specified in flags and the caller is in
          a chroot environment (i.e., the caller's root directory
          does not match the root directory of the mount namespace
          in which it resides).

I thinking maybe this is kicking in for Shifter. I would have to dig through the kernel code to know for sure.

DrDaveD commented 2 years ago

Good find, that does look like a likely explanation. Could shifter use a mount namespace before doing chroot? That's exactly what unshare -rm creates.

DrDaveD commented 2 years ago

It's possible that chroot itself doesn't work in a mount namespace, at least an unprivileged one. In cvmfsexec I ended up having to use pivot_root instead but it accomplished the same purpose.

scanon commented 2 years ago

Shifter does use the mount namespace in the case where you don't use the WLM integration (which is how I was testing it). This could be a side effect of how some of the mounts are done in the name space but I'm not sure.

DrDaveD commented 2 years ago

Maybe it needs pivot_root instead of chroot?

scanon commented 2 years ago

@DrDaveD I'll take a look. It may take me a bit to set up a test instance, but I'll try to experiment and reply.

hufnagel commented 2 years ago

Any news on this (has been a month...)?