Open Mossaka opened 5 months ago
Before youki we did everything ourselves. It may not be necessary to do this anymore.
From https://github.com/containerd/runwasi/pull/327#issuecomment-1730071090, removing that setup_namespaces
call means that the network ns is not set up at all.
At the time I thought that maybe youki wasn't setting up the network ns, but it does.
It would be great if anyone has further insight.
We removed it awhile back and broke everything 🤦 https://github.com/containerd/runwasi/pull/364
youki is only setting the Pid and User namespaces
That doesn't seem right :(
youki is only setting the Pid and User namespaces
That doesn't seem right :(
That was me misreading the code. Youki sets Pid and User ns at one point, and the rest on a different place. IIRC, Pid and User ns need to be set up in the intermediate process, while the rest are set up in the init process.
Youki sets Pid and User ns at one point, and the rest on a different place.
Okay, if this is true, then the shim doesn't need to set up its own network ns. There must be some root causes that we haven't yet discovered.
I've added some logging to runwasi and youki when setting namespaces, and got:
level=info msg=">>>>> runwasi > unshare(CloneFlags(CLONE_NEWNET))"
level=info msg="server listen started"
level=info msg="server started"
level=info msg="Shim successfully started, waiting for exit signal..."
level=debug msg="Got new client"
...
level=warn msg="Controller rdma is not yet implemented."
level=warn msg="Controller misc is not yet implemented."
level=debug msg="unshare or setns: LinuxNamespace { typ: Pid, path: None }"
level=info msg=">>>>> youki > unshare(CloneFlags(CLONE_NEWPID))"
level=debug msg="sending init pid (Pid(1883648))"
level=debug msg="unshare or setns: LinuxNamespace { typ: Uts, path: None }"
level=info msg=">>>>> youki > unshare(CloneFlags(CLONE_NEWUTS))"
level=debug msg="unshare or setns: LinuxNamespace { typ: Ipc, path: None }"
level=info msg=">>>>> youki > unshare(CloneFlags(CLONE_NEWIPC))"
level=debug msg="unshare or setns: LinuxNamespace { typ: Network, path: None }"
level=info msg=">>>>> youki > unshare(CloneFlags(CLONE_NEWNET))"
level=debug msg="unshare or setns: LinuxNamespace { typ: Cgroup, path: None }"
level=info msg=">>>>> youki > unshare(CloneFlags(CLONE_NEWCGROUP))"
level=debug msg="unshare or setns: LinuxNamespace { typ: Mount, path: None }"
level=info msg=">>>>> youki > unshare(CloneFlags(CLONE_NEWNS))"
level=debug msg="prepare rootfs rootfs="/var/lib/docker/rootfs/overlayfs/7c52b78c3a22d2a00cb67cdab9fd090241ef9cdf8f47bb0d7f20daba465662a2""
level=debug msg="mount root fs "/var/lib/docker/rootfs/overlayfs/7c52b78c3a22d2a00cb67cdab9fd090241ef9cdf8f47bb0d7f20daba465662a2""
level=debug msg="mounting Mount { destination: "/proc", typ: Some("proc"), source: Some("proc"), options: Some(["nosuid", "noexec", "nodev"]) }"
...
The lines with >>>>>
is what I've added.
The network namespce is indeed being set twice.
By runwasi when the shim starts, and by youki on the init process.
It looks like after removing the network ns from shim start
, the nginx pod had issues to start:
https://github.com/containerd/runwasi/actions/runs/8148869974/job/22272757552?pr=503#step:8:794
Pid and User ns need to be set up in the intermediate process
Just a drive by comment. The reason for this is that the pid
and user
namespace are a bit special and required special treatment. For the pid
namespace, one can't simply join the pid
namespace. Only the child process can join the new pid
namespace, so we have to fork. In youki
's case, we do this in the intermediate process to fork the init process.
user
namespace is also special. In order to setup user
namespace correctly, we need a parent process to setup the uid/gid mapping. This can't be set once inside the user namespace. So we join the user namespace when we fork the intermediate process from the main process. The intermediate process will join the user namespace and the main
process will setup the uid/gid mapping.
@yihuaf Thanks for giving some context. Is what we are doing now, correct or do we need to make some changes in runwasi?
Ref: https://github.com/containerd/runwasi/blob/71f8df9cc576ea9564b3ea692432e80b454da7e5/crates/containerd-shim-wasm/src/sandbox/shim/cli.rs#L56-L57
It is not clear to me the reason why the shim start function sets up it's own network ns. I checked out both the Go and Rust implementation of runc-shim, they aren't setting the same ns at shim start.
@jsturtevant pointed me to this https://github.com/containers/youki/issues/2473 from youki but it looks like youki was setting up all the namespaces for containers but somehow the network ns was ignored. I am raising the issue here for asking more investigation.
FYI @jprendes