Closed gdm85 closed 9 years ago
Thanks for reporting!
But this doesn't:
root@localhost:~# pflask --chroot=rootfs --user=root -- id [✘] Error mounting 'sysfs': Operation not permitted [✘] Child failed with code '1'
The CLI help text says the default is root for --user, yet it doesn't work if it's specified.
The problem here is that an explicit --user
also automatically enables the user namespace support. Unfortunately to use --chroot
with user namespaces, the --user-map
option is needed (see example in the README). The man page didn't mention this, so I just pushed a change that clarifies this a bit.
In your case, for using --user
and --chroot
without user namespaces, you should also pass --no-userns
which explicitly disables userns. This is not very ideal, so I may change it somehow in the future (suggestions welcome!).
Also please note I get a segfault when using --netif:
root@localhost:~# pflask --chroot=rootfs --netif -- id Segmentation fault
Ugh, fixed.
@ghedo thanks! those issues are actually fixed. Now if I add --no-userns
it works:
root@localhost:~# pflask --chroot=rootfs --user=root --no-userns -- id
uid=0 gid=0
[✔] Child exited
The issue could be closed now, however I have one question, maybe you can help me with it as I suspect it's not possible to achieve what I have in mind. I have been trying various tools (unshare, contain, LXD etc) without success, my goal is in a few words:
Run a process as user nobody within the container, mapped to subuid/subgid of a non-root user on the host
I am using a busybox chroot for this experiment
I have seemingly correct subuid/subgid definitions on host:
root@localhost:~# cat /etc/subgid
root:100000:65536
myuser:296608:65536
myuser:100000:65537
root@localhost:~# cat /etc/subuid
root:100000:65536
myuser:296608:65536
myuser:100000:65537
First I'd try to map nobody within the container as a subuid/subgid of root on host, and maybe later make it work with non-root on the host, however with pflask I am really not able to carry this out. For example, I thought this would be the correct syntax:
root@localhost:~# pflask --chroot=rootfs --user-map=0:100000:65536 --user-map=1:100000:65536 --user=nobody -- id
newuidmap: write to uid_map failed: Invalid argument
[✘] newuidmap 256 returned 4248856
But I get that newuidmap error :(
root@localhost:~# pflask --chroot=rootfs --user-map=0:100000:65536 --user-map=1:100000:65536 --user=nobody -- id newuidmap: write to uid_map failed: Invalid argument [✘] newuidmap 256 returned 4248856
The following command works for me (using only one --user-map
):
$ pflask --user-map=0:100000:65536 --user=nobody -- id
uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)
As for the --chroot
option, pflask currently can't mount sysfs from inside a user namespace, so you need to remove lines 189 and 190 in mount.c to make it work for now. I'm still trying to figure out a proper workaround for this.
For the record, the --chroot
option used to work fine with user namespaces in older Linux versions, but then something changed and now it doesn't work anymore.
@ghedo thanks for the example. I am fiddling around with it, I suppose one should first mount and then drop privileges to the selected user?
This is what I am using at the moment:
root@localhost:~# build/pflask --user-map=0:100000:500 --user-map=`id -u nobody`:100500:500 --user=nobody -- id
uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)
[✔] Child exited
By not using overlapping ranges for the mapping I can make nobody
work as expected, although I still get permission errors when trying to use --chroot
(even by disabling the mount you suggested), because it fails with other mount points too (proc etc).
Also, I think -M
should be disallowed with --chroot
(or they can work together?)
Btw, if I disable all the mounts in mount.c, I get:
# build/pflask --chroot rootfs --user-map=0:100000:500 --user-map=`id -u nobody`:100500:500 --user=nobody -- id
[✘] Error creating file 'rootfs/dev/console': Permission denied
[✘] Child failed with code '1'
So it needs at least the console device :)
The following command works for me if I remove the sysfs mount code:
$ pflask --user-map=65534:100000:1 --user-map=0:100001:65534 --user=nobody --chroot=$HOME/local/rootfs
In fact I can actually map nobody
inside the container to my own user outside of the container:
$ pflask --user-map=65534:$UID:1 --user-map=0:100001:65534 --user=nobody --chroot=$HOME/local/rootfs
with 65534 being the UID of nobody, and $UID being my user's UID.
Also, I think -M should be disallowed with --chroot (or they can work together?)
If you run pflask as root it should work just fine, however it doesn't unmount all the additional mount points on exit (this is normally not necessary since when the mount namespace is destroyed the mount points are automatically purged) so you'll have to do that manually.
@ghedo I've pinpointed the problem, see also this commit where I introduced a flag for testing under different kernels
With kernel 3.13.0-45 I have to build with everything disabled (kernelQuality=0
), with kernel 3.19.0-28 and 3.16.0-45 I can use your workaround (disabling only sysfs mount, kernelQuality=1
); at this point I am looking for a kernel where everything can be mounted (sysfs too, kernelQuality=2
).
This issue is similar, and it's correctly blamed to a kernel bug upstream (the lack of this kernel commit), as explained here: https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg132608.html
I suggest to not support at all buggy kernels; I am going to test with more modern kernels to see if it's been fixed already, or if it needs to be done in a different way.
edit: I originally thought that 3.16.0-45 was unaffected by the sysfs issue, but it is too
No luck with Vivid's kernel 3.19.8 :(
At this point I give up, I will use the chroots without sysfs...
I'm on Linux 4.2 now (which is supposed to have the commit you linked) but still no luck with sysfs. I also tried LXC and it works fine, so I'm guessing there's also a problem in pflask. I'll try to see what LXC does that pflask doesn't.
Also thanks for your research :)
You're welcome :)
@gdm85 I found that the reason why sysfs can't be mounted with userns is this commit https://github.com/torvalds/linux/commit/7dc5dbc879bd0779924b5132a48b731a0bc04a1e, which prevents mounting sysfs unless you have CAP_SYS_ADMIN rights over the network namespace (the reason why I remember this working is probably that when I first implemented this I was on an even older kernel).
The only workaround is to enable a network namespace with --netif
, which is kind of a problem because unprivileged containers can't create network interfaces so they would lose networking completely...
Also, the reason why LXC seems to work is that they enable the netns by default, so there's really no other way around this. The good news is that it's not a pflask bug :/
I'll try to somehow explain this in the documentation.
@ghedo this sounds familiar somehow, maybe I've read about it in some old Docker issue. Would it be possible somehow to mount sysfs
before cloning (and thus dropping the network namespace)?
On an unrelated topic, I am experimenting how to call pflask
with a symlink (pflask-drink
, probably) so that apparmor limitations can be applied only to the symlink-invoked binary that would be responsible for running the user-requested process with the further apparmor limitations; in such scenario, it would also make sense to do all the elevated privileges operations before running the containerized process.
Would it be possible somehow to mount sysfs before cloning (and thus dropping the network namespace)?
Maybe pflask could do a first clone creating only a mount namespace, mount sysfs and then clone again to create the other namespaces, but I'm not sure it would be worth the hassle since it still wouldn't work with unprivileged containers.
@ghedo I was thinking more along the lines: run all the setup stages before dropping privileges and capabilities; slightly more complex
Ciao Alessandro.
I've found various issues while trying to use pflask. For example this command works:
But this doesn't:
The CLI help text says the default is
root
for--user
, yet it doesn't work if it's specified.Also please note I get a segfault when using
--netif
:Second GDB, problem is at
#2 0x000000000040962c in validate_optlist (name=0x40d012 "--netif", opts=0x0) at ../src/pflask.c:323