Open sunfishcode opened 3 years ago
but from some experimenting, child processes do inherit non-O_CLOEXEC file descriptors.
Yes, legacy of Unix design continuing to bite. I've seen this problem pop up everywhere; long ago, Firefox had a bug where it leaked the esound fd to child processes which could break sound. In fact today for me running ls -al /proc/self/fd
in the integrated VS Code terminal is an excellent example of this problem; the bash process has open FDs for various fonts and internal VS Code sockets...
One approach that some language runtimes have taken is to by default walk over all open FDs and close them, except the ones that are explicitly (via API) passed to the child process.
The main one I'm thinking of here is Python's subprocess which notice that close_fds=True
by default, and there's an explicit
pass_fds is an optional sequence of file descriptors to keep open between the parent and child. Providing any pass_fds forces close_fds to be True. (POSIX only)
I helped write glib's GSubprocess and it was an explicit choice to do it the same way as Python is doing it; there's an equivalent G_SUBPROCESS_FLAGS_INHERIT_FDS
flag and APIs for explicit FD passing.
While we clearly can't switch Rust's Command
to do this by default, I think we should offer it as an option, and add an explicit API to transfer fds. One thing I only recently realized is that this intersects with the rise of posix_spawn
- if we can take a hard dependency on that, then I think this is just binding posix_spawn_file_actions_adddup2()
.
In the short term...WDYT about offering this API https://github.com/ostreedev/ostree-rs-ext/pull/201/commits/89d3727e2942a512cbfd860f72680526bd4e8c7a somewhere in the rustix ecosystem? If we took a hard dependency on posix_spawn
, then rustix could offer a binding for that?
OK, there is posix_spawn_file_actions_addclose()
but it's problematic to use that to implement "close all the other fds" because of inherent race conditions with other threads opening non-O_CLOEXEC
fds.
There is one other argument for avoiding posix_spawn
for now (aside from the bit that it's a libc thing and would need nontrivial reimplementation outside of libc, and if we do that we might as well only have one) is that in a capability system, I think what we really want to use is something more like Linux's fexecve()
. IOW we always first use openat()
to acquire the binary to execute.
This take_fd_n
looks like a step in a useful direction, though I'd like to brainstorm a bit more around the target: i32
part.
Here's a very rough sketch of the kind of thing I'm thinking about: https://github.com/sunfishcode/intercomm
In particular, see examples/fun.rs. The goal is to make passing file descriptors as easy as passing strings, where users don't need to manually allocate child file descriptor indices.
The code above works with Command which takes a plain path, but I agree that some form of fexecve
/execveat
is nicer. Perhaps a cap_std version of Command is the place to support that.
Hmm. Some overlap with https://docs.rs/procspawn/0.10.0/procspawn/ (cool crate if you aren't aware of it). It also seems like most users would want at least a serde
feature so they can pass anything serializable and not just a hardcoded set? Or dunno, maybe just integers and strings and file descriptors is sensible. (Passing floating point values into CLI apps seems fairly obscure to me)
I definitely love the idea of making it foolproof to pass the file descriptor and the string value with its number as a single unit from the caller's PoV. I have a small concern that there might be programs that want the fd in a special way. At least for my particular use case it looks like --sockfd=N
or --sockfd N
(i.e. "normal" CLI style) so the latter would work with this.
I'm a bit uncertain; this approach clearly has more safety, but there's a lot more logic (and opinions) going on, and it seems to me it could make sense to at least offer something close to my proposed lower level API which is still arguably safe?
For now I created https://github.com/cgwalters/cap-std-ext - I'm thinking to centralize more things there too as a staging ground. (xref https://github.com/bytecodealliance/cap-std/issues/211 too)
That makes sense for now. I think I'm getting bogged down here because this whole topic ties into some big-picture goals for me.
One off the things that's tricky is that after the user calls take_fd_n
, they have to somehow arrange for the child process to know which file descriptor to open, and do the equivalent of from_raw_fd
on it, which is unsafe. To get rid of that unsafe, one option is to have a command-line parser that the child could use that could deserialize into something like InterTypeable
values in a way that could encapsulate the unsafety. So this is leading me to think about serde-like APIs, and perhaps there's a serde compat layer, but it would need to ultimately be a separate system because serde's data model doesn't have handles.
Yeah. In my case though the child process is Go, not Rust. So...none of that applies unfortunately.
(If you are idly curious, this is used by https://github.com/containers/containers-image-proxy-rs/ which is part of my active project in pulling operating system updates via the containers/image stack, and the host process here is increasingly Rust with a large unfortunate swath of C/C++, and I definitely didn't want to add Go into the address space too :wink: )
but it would need to ultimately be a separate system because serde's data model doesn't have handles.
Tangentially related, https://docs.rs/ipc-channel/ supports adding channels into messages over its own channels. I didn't deep dive but I think this works by adding it to an out-of-band queue; since the crate is also responsible for serializing and doing writing, it can handle that too.
OK https://crates.io/crates/cap-std-ext exists now on crates.io. I also worked on an O_TMPFILE API.
What do you think? Basically I (or we, and others) can iterate on some higher level APIs there, and then migrate them down into cap-std where it makes sense.
Looks good! And yes, I think this makes sense. There are a lot of utility functions that could potentially go in such a crate, so let's see where it goes.
Following up on a discussion in #97:
std::process::Command
has a safe API for spawning child processes. Its documentation doesn't mention non-O_CLOEXEC
file descriptors, but from some experimenting, child processes do inherit non-O_CLOEXEC
file descriptors.Does this violate I/O safety?
When a program calls
Command::spawn
, all existing memory in the process is replaced with a new program image, so Rust's safety guarantees have typically considered that to be the end of one program and the beginning of a new one. However, this is an area where file descriptors are different from pointers. FIle descriptors not markedO_CLOEXEC
persist into the next program. In effect,spawn
is implicitly obtaining the values of all such file descriptors in the process, even those meant to be encapsulated, and "passing" them to another program where they could be accessed. That has the shape of an I/O safety violation. And in practice, leaking file descriptors to child processes is a real security problem.One possible way out of this is to say that safe code can't create non-
O_CLOEXEC
file descriptors. Rust's std already does open all file descriptors asO_CLOEXEC
whenever it can, and sets them toO_CLOEXEC
even when it can't create them that way. Perhaps then, rustix'sopenat
function should always setO_CLOEXEC
too, and perhaps there should be a special function named something likeunsafe fn dup2_no_cloexec
for creating non-O_CLOEXEC
file descriptors for when those are really needed.One tricky issue is that not all OS's can set
O_CLOEXEC
atomically in all cases. Rust usesioctl
withFIOCLEX
to set theCLOEXEC
flag as soon as it can, but there is a window where another thread could callexec
and capture the file descriptor before it hasO_CLOEXEC
set. A related issue is that users ofstd::process::Command
may already be creating non-O_CLOEXEC
file descriptors within safe functions. A possible long-term path out of this is to say that post-exec
programs accessing file descriptors passed throughexec
is in the same category as using a debugger on the pre-exec
program, so it's out of scope for I/O safety, and then to provide a new safe way to pass file descriptors to child processes and gradually encourage the Rust ecosystem to adopt it.