Open Sacred-Salamander opened 1 year ago
https://github.com/containers/bubblewrap/commit/75c2d94de8a6a3f13619aecf3d5a2a5276942a88
Add support for --userns and --userns2
This allows you to reuse an existing user namespace to set up all the other namespaces, entering that instead of creating a new one. The reason you want to do this is that you can then also reuse other namespaces that are owned by the user namespace. Typically you use this to partially re-enter a previoulsy created bubblewrap sandbox.
This also adds --userns2 which is similar to --userns, but this is switched into at the end instead of the start. Bubblewrap sometimes creates nested such user namespaces[1], and to be able to reuse such a setup we need to similarly reuse both namespaces via --userns2.
Technically using setns() is probably safe even in the privileged case, because we got passed in a file descriptor to the namespace, and that can only be gotten if you have ptrace permissions against the target, and then you could do whatever to the namespace anyway. However, for practical reasons this isn't useable for bwrap, because (as described in a comment in acquire_privs()) setuid mode causes root to own the namespaces that it creates. So as you will not be able to access these namespaces for reuse anyway, its best to disable it (in case of unexpected security issues).
[1] This is to work around an issue with mounting devpts without uid 0 mapped in the user namespace, where the outer namespace owns all the other namespaces but the inner one has the right mappings.
bwrap: Setting userns2 failed: Invalid argument
Reason:
EINVAL The caller attempted to join the user namespace in which it is already a member.
You need to pass a different userns to --userns2 that is a child of --userns because:
EINVAL The caller tried to join an ancestor (parent, grandparent, and so on) PID namespace.
Can you please give an example, I also found most of that information, but it is not clear how I find or pass the child to the first namespace, since they are either effectively the same pid or the nested namespace belongs to pid 1
Would it be like /proc/80157/root/proc/80157/ns/user?
I don't have an (full, working) example. But I'm questing whether you understood user namespaces and nesting of them.
edit: This was the reason for two user namespaces:
I think I mostly understands it, but correct me if I don't make sense
bwrap --unshare-user --dev-bind / / --tmpfs /tmp /bin/bash
For example lsns on the host says this
$ lsns
NS TYPE NPROCS PID USER COMMAND
4026533021 user 2 80157 user /bin/bash
4026533022 mnt 1 80157 user /bin/bash
And inside the example shell
$ lsns
NS TYPE NPROCS PID USER COMMAND
4026531834 time 2 80157 user /bin/bash
4026531835 cgroup 2 80157 user /bin/bash
4026531836 pid 2 80157 user /bin/bash
4026531838 uts 2 80157 user /bin/bash
4026531839 ipc 2 80157 user /bin/bash
4026531992 net 2 80157 user /bin/bash
4026533021 user 2 80157 user /bin/bash
4026533022 mnt 2 80157 user /bin/bash
I'm sorry it is correct that the namespace doesn't belong to a process, is it correct to say that the process has attached namespaces?
Finally I have a ugly, working PoC
run in first terminal:
unshare --map-root-user --fork sh -c "echo \$\$ >/tmp/pid1 && unshare -U --fork sh -c \"echo \\\$\\\$ >/tmp/pid2 && sleep 10m && true\" && true"
run in second terminal:
bwrap --userns 3 3</proc/$(cat /tmp/pid1)/ns/user --userns2 4 4</proc/$(cat /tmp/pid2)/ns/user --dev-bind / / ls
I'm sorry it is correct that the namespace doesn't belong to a process, is it correct to say that the process has attached namespaces?
The manpages describes it as
A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes. One use of namespaces is to implement containers.
Bubblewrap creates the namespaces when I run the example
At this point an fd to --userns userns is only reachable via ioctl I guess?
Great PoC
I guess that answers how to use the option
But is it possible to use it with the example I made? As far as I can tell bubblewrap also creates me an nested namespace from that, is it fundamentally different? Can I not get the intermediate pid from the host?
At this point an fd to --userns userns is only reachable via ioctl I guess?
I don't know, can it not be reachable from procfs?
Can I not get the intermediate pid from the host?
If you share the pidns there is no intermediate pid, because there is no need to fork twice.
I don't know, can it not be reachable from procfs?
I don't know a way since we do not know a process in this userns (direct member).
Would it be possible with this example when I also unshare the pidns?
bwrap --unshare-user --unshare-pid --dev-bind / / --proc /proc --tmpfs /run --tmpfs /tmp /bin/bash
About this part:
Bubblewrap sometimes creates nested such user namespaces[1], and to be able to reuse such a setup we need to similarly reuse both namespaces via --userns2.
I think that this is what the example/test should show how to do, how to enter it when using bubblewrap and not the unshare program
You also said that there is no intermediate pid when you share the pidns, as it doesn't fork twice. I'm not fully following this, does it mean that my first example does not create a nested user namespace at all? Or just one that I can't see in anyway? Does it mean my latest example from the post above is creating it but that I still can't access it? When and how is it possible to use userns2 with bubblewrap as initiator as the commit message suggests?
Trying to get a grasp on the nested namespaces and how to enter those with --userns2 There are no examples or tests I can find
For example when I run
I believe this creates a nested user namespace, I can see a complete namespace set (time,cgroup,uts,ipc,net,user,mnt,pid) using lsns while if I do lsns on the host there are only 2 namespaces for the process (user,mnt)
How can I make this work?
I tried with
but results in this error
What should I feed into the file descriptor here?
I'm also wondering about this text in the bwrap manual about the option
Can anyone fully explain how this works, when bubblewrap creates nested namespaces and when it doesn't? What are the kernel issues that are worked around? any upstream mailing list conversations about it?