Missing examples / tests for --userns2 option

Sacred-Salamander commented 1 year ago

Trying to get a grasp on the nested namespaces and how to enter those with --userns2 There are no examples or tests I can find

For example when I run

bwrap --unshare-user --dev-bind / / --tmpfs /tmp /bin/bash

I believe this creates a nested user namespace, I can see a complete namespace set (time,cgroup,uts,ipc,net,user,mnt,pid) using lsns while if I do lsns on the host there are only 2 namespaces for the process (user,mnt)

How can I make this work?

bwrap --userns 11 --userns2 12 --dev-bind / / /bin/bash 11</proc/80157/ns/user 12</proc/???/ns/user

I tried with

bwrap --userns 11 --userns2 12 --dev-bind / / /bin/bash 11</proc/80157/ns/user 12</proc/80157/ns/user

but results in this error

bwrap: Setting userns2 failed: Invalid argument

What should I feed into the file descriptor here?

I'm also wondering about this text in the bwrap manual about the option

This is useful because sometimes bubblewrap itself creates nested user namespaces (to work around some kernel issues) and --userns2 can be used to enter these.

Can anyone fully explain how this works, when bubblewrap creates nested namespaces and when it doesn't? What are the kernel issues that are worked around? any upstream mailing list conversations about it?

rusty-snake commented 1 year ago

https://github.com/containers/bubblewrap/commit/75c2d94de8a6a3f13619aecf3d5a2a5276942a88

Add support for --userns and --userns2

This allows you to reuse an existing user namespace to set up all the other namespaces, entering that instead of creating a new one. The reason you want to do this is that you can then also reuse other namespaces that are owned by the user namespace. Typically you use this to partially re-enter a previoulsy created bubblewrap sandbox.

This also adds --userns2 which is similar to --userns, but this is switched into at the end instead of the start. Bubblewrap sometimes creates nested such user namespaces[1], and to be able to reuse such a setup we need to similarly reuse both namespaces via --userns2.

Technically using setns() is probably safe even in the privileged case, because we got passed in a file descriptor to the namespace, and that can only be gotten if you have ptrace permissions against the target, and then you could do whatever to the namespace anyway. However, for practical reasons this isn't useable for bwrap, because (as described in a comment in acquire_privs()) setuid mode causes root to own the namespaces that it creates. So as you will not be able to access these namespaces for reuse anyway, its best to disable it (in case of unexpected security issues).

[1] This is to work around an issue with mounting devpts without uid 0 mapped in the user namespace, where the outer namespace owns all the other namespaces but the inner one has the right mappings.

bwrap: Setting userns2 failed: Invalid argument

Reason:

EINVAL The caller attempted to join the user namespace in which it is already a member.

You need to pass a different userns to --userns2 that is a child of --userns because:

EINVAL The caller tried to join an ancestor (parent, grandparent, and so on) PID namespace.

Sacred-Salamander commented 1 year ago

Can you please give an example, I also found most of that information, but it is not clear how I find or pass the child to the first namespace, since they are either effectively the same pid or the nested namespace belongs to pid 1

Would it be like /proc/80157/root/proc/80157/ns/user?

rusty-snake commented 1 year ago

I don't have an (full, working) example. But I'm questing whether you understood user namespaces and nesting of them.

How do you create the user namespaces you want to pass?
How did you determined the pid the userns belongs to? (What technically doesn't make sense since a usernamespace can not belong to a process.)

edit: This was the reason for two user namespaces:

https://github.com/containers/bubblewrap/blob/bb7ac1348f98ee48f1e2e38bdf93abca2e4f6d06/bubblewrap.c#L3008-L3011

Sacred-Salamander commented 1 year ago

I think I mostly understands it, but correct me if I don't make sense

Bubblewrap creates the namespaces when I run the example

bwrap --unshare-user --dev-bind / / --tmpfs /tmp /bin/bash

I can determine the pid of the process in several ways, lsns on both the host and in the bash shell I launched, but also echo $$ in the example shell all says the same pid

For example lsns on the host says this

$ lsns
        NS TYPE   NPROCS     PID USER COMMAND
4026533021 user        2   80157 user /bin/bash
4026533022 mnt         1   80157 user /bin/bash

And inside the example shell

$ lsns
        NS TYPE   NPROCS   PID USER COMMAND
4026531834 time        2 80157 user /bin/bash
4026531835 cgroup      2 80157 user /bin/bash
4026531836 pid         2 80157 user /bin/bash
4026531838 uts         2 80157 user /bin/bash
4026531839 ipc         2 80157 user /bin/bash
4026531992 net         2 80157 user /bin/bash
4026533021 user        2 80157 user /bin/bash
4026533022 mnt         2 80157 user /bin/bash

I'm sorry it is correct that the namespace doesn't belong to a process, is it correct to say that the process has attached namespaces?

rusty-snake commented 1 year ago

Finally I have a ugly, working PoC

run in first terminal:

unshare --map-root-user --fork sh -c "echo \$\$ >/tmp/pid1 && unshare -U --fork sh -c \"echo \\\$\\\$ >/tmp/pid2 && sleep 10m && true\" && true"

run in second terminal:

bwrap --userns 3 3</proc/$(cat /tmp/pid1)/ns/user --userns2 4 4</proc/$(cat /tmp/pid2)/ns/user --dev-bind / / ls

I'm sorry it is correct that the namespace doesn't belong to a process, is it correct to say that the process has attached namespaces?

The manpages describes it as

A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes. One use of namespaces is to implement containers.

Bubblewrap creates the namespaces when I run the example

At this point an fd to --userns userns is only reachable via ioctl I guess?

Sacred-Salamander commented 1 year ago

Great PoC

I guess that answers how to use the option

But is it possible to use it with the example I made? As far as I can tell bubblewrap also creates me an nested namespace from that, is it fundamentally different? Can I not get the intermediate pid from the host?

At this point an fd to --userns userns is only reachable via ioctl I guess?

I don't know, can it not be reachable from procfs?

rusty-snake commented 1 year ago

Can I not get the intermediate pid from the host?

If you share the pidns there is no intermediate pid, because there is no need to fork twice.

I don't know, can it not be reachable from procfs?

I don't know a way since we do not know a process in this userns (direct member).

Sacred-Salamander commented 1 year ago

Would it be possible with this example when I also unshare the pidns?

bwrap --unshare-user --unshare-pid --dev-bind / / --proc /proc --tmpfs /run --tmpfs /tmp /bin/bash

Sacred-Salamander commented 1 year ago

About this part:

Bubblewrap sometimes creates nested such user namespaces[1], and to be able to reuse such a setup we need to similarly reuse both namespaces via --userns2.

I think that this is what the example/test should show how to do, how to enter it when using bubblewrap and not the unshare program

You also said that there is no intermediate pid when you share the pidns, as it doesn't fork twice. I'm not fully following this, does it mean that my first example does not create a nested user namespace at all? Or just one that I can't see in anyway? Does it mean my latest example from the post above is creating it but that I still can't access it? When and how is it possible to use userns2 with bubblewrap as initiator as the commit message suggests?

containers / bubblewrap

Missing examples / tests for --userns2 option #542

Add support for --userns and --userns2