Open jprendes opened 1 year ago
Some context:
The code we run in our impl of the libcontainer
's Executor
is run in a clone
d process.
In POSIX, the kind of code that is guaranteed to run safely in the child process after a clone
is restricted to async-signal-safe code. Examples of code that are NOT guaranteed to be AS-safe include malloc
, printf
and pthread_*
.
See for example how the snippet in nix::fork uses write(..)
in the child process instead of println!(..)
exactly because of this.
It seems that in our case with glibc and musl, it works fine, but this is fragile.
I'm thinking that a possible approach would be to use the async api of containerd/rust-extensions and have the shim be a single threaded process.
This is assuming that the way rust-extensions handles async is single threaded.
Hey @jprendes I know you've worked on the asyncify runwasi, do you have any updates on that one? Do you think we need another issue to track the progress of it?
I can think of two options to workaround this issue:
That's what normal for-exec does, and what a normal container does. The disadvantage is that we loose any sharing between the shim and the container (e.g., anything pre-initialized by the shim won't seen by the container). I know @cpuguy83 would like to see all containers in a pod sharing the same runtime, and that would become impossible with this approach (admitedly, that's not the direction we are heading anyway with using youki). The advantage is that for windows we would need to do that anyway (IIUC, in windows there's no fork and exec, but rather fork+exec as one single thing), and having a similar behavior in all platforms might be nice.
I would eventually like to see that regardless of this issue.
The disadvantage is that it's a bit more difficult to enforce, as even some tokio operations might inadvertently spin new threads. Also the async feature in rust-extensions
still spins a few threads, and is not supported on Windows.
The advantage, is that I would expect lower resource usage by the shim, better integration of the containerd client code, and would fix this issue without preventing sharing resources between shim and container.
I think exec is probably the way to go for now. I think we can go forward with that and then look at how we might optimize things. e.g. can we send rpc's into that container to do things like handle (container) execs.
Thinking further about it, we can't exec, as the target binary would be the shim binary, which is not available inside the container. AFAIK, the single threaded process would be the only solution here.
Thinking further about it, we can't exec, as the target binary would be the shim binary, which is not available inside the container. AFAIK, the single threaded process would be the only solution here.
@jsturtevant , would this be a problem for the Windows implementation you had in mind (where it would crate a new child process)?
I'm 👎 on using async as a way to work around this. This is a (potential) fix by coincidence and not a proper one.
I've said this a few times, but I think the abstraction we are using from youki is doing more than what we actually need (or perhaps isn't flexible enough) for wasi containers.
I've said this a few times, but I think the abstraction we are using from youki is doing more than what we actually need (or perhaps isn't flexible enough) for wasi containers.
Yes, I agree with a part of @cpuguy83 in this option. Sorry, I don't have enough knowledge to speak aync way.
When I incorporated youki into runwasi at that time https://github.com/containerd/runwasi/issues/23 it was still difficult to consider the future what would happen. This is because we did not know what would be needed. Now we know what we need in the wasi container more concrete. So we may be able to list how OCI compliant we need to be, i.e., which features we need and which we don't.
Then, we may be able to easily build a wasi container directly using functions in a lower layer of libcontainer
instead of using the Builder pattern of libcontainer
. For a simple example, we can use cgroup and so on. Also, you can use the rootfs functions from youki. If you want to do that I can consider it once and I'd be happy to help you with it.
If you want to be OCI compliant, I think it would be best to go in the direction of extensibility using the Builder pattern. So there is a tradeoff there.
Originally posted by @jprendes in https://github.com/containerd/runwasi/issues/347#issuecomment-1755433335