std::process::Command and I/O safety

sunfishcode commented 3 years ago

Following up on a discussion in #97: std::process::Command has a safe API for spawning child processes. Its documentation doesn't mention non-O_CLOEXEC file descriptors, but from some experimenting, child processes do inherit non-O_CLOEXEC file descriptors.

Does this violate I/O safety?

When a program calls Command::spawn, all existing memory in the process is replaced with a new program image, so Rust's safety guarantees have typically considered that to be the end of one program and the beginning of a new one. However, this is an area where file descriptors are different from pointers. FIle descriptors not marked O_CLOEXEC persist into the next program. In effect, spawn is implicitly obtaining the values of all such file descriptors in the process, even those meant to be encapsulated, and "passing" them to another program where they could be accessed. That has the shape of an I/O safety violation. And in practice, leaking file descriptors to child processes is a real security problem.

One possible way out of this is to say that safe code can't create non-O_CLOEXEC file descriptors. Rust's std already does open all file descriptors as O_CLOEXEC whenever it can, and sets them to O_CLOEXEC even when it can't create them that way. Perhaps then, rustix's openat function should always set O_CLOEXEC too, and perhaps there should be a special function named something like unsafe fn dup2_no_cloexec for creating non-O_CLOEXEC file descriptors for when those are really needed.

One tricky issue is that not all OS's can set O_CLOEXEC atomically in all cases. Rust uses ioctl with FIOCLEX to set the CLOEXEC flag as soon as it can, but there is a window where another thread could call exec and capture the file descriptor before it has O_CLOEXEC set. A related issue is that users of std::process::Command may already be creating non-O_CLOEXEC file descriptors within safe functions. A possible long-term path out of this is to say that post-exec programs accessing file descriptors passed through exec is in the same category as using a debugger on the pre-exec program, so it's out of scope for I/O safety, and then to provide a new safe way to pass file descriptors to child processes and gradually encourage the Rust ecosystem to adopt it.

cgwalters commented 2 years ago

but from some experimenting, child processes do inherit non-O_CLOEXEC file descriptors.

Yes, legacy of Unix design continuing to bite. I've seen this problem pop up everywhere; long ago, Firefox had a bug where it leaked the esound fd to child processes which could break sound. In fact today for me running ls -al /proc/self/fd in the integrated VS Code terminal is an excellent example of this problem; the bash process has open FDs for various fonts and internal VS Code sockets...

One approach that some language runtimes have taken is to by default walk over all open FDs and close them, except the ones that are explicitly (via API) passed to the child process.

The main one I'm thinking of here is Python's subprocess which notice that close_fds=True by default, and there's an explicit

pass_fds is an optional sequence of file descriptors to keep open between the parent and child. Providing any pass_fds forces close_fds to be True. (POSIX only)

I helped write glib's GSubprocess and it was an explicit choice to do it the same way as Python is doing it; there's an equivalent G_SUBPROCESS_FLAGS_INHERIT_FDS flag and APIs for explicit FD passing.

While we clearly can't switch Rust's Command to do this by default, I think we should offer it as an option, and add an explicit API to transfer fds. One thing I only recently realized is that this intersects with the rise of posix_spawn - if we can take a hard dependency on that, then I think this is just binding posix_spawn_file_actions_adddup2().

In the short term...WDYT about offering this API https://github.com/ostreedev/ostree-rs-ext/pull/201/commits/89d3727e2942a512cbfd860f72680526bd4e8c7a somewhere in the rustix ecosystem? If we took a hard dependency on posix_spawn, then rustix could offer a binding for that?

cgwalters commented 2 years ago

OK, there is posix_spawn_file_actions_addclose() but it's problematic to use that to implement "close all the other fds" because of inherent race conditions with other threads opening non-O_CLOEXEC fds.

There is one other argument for avoiding posix_spawn for now (aside from the bit that it's a libc thing and would need nontrivial reimplementation outside of libc, and if we do that we might as well only have one) is that in a capability system, I think what we really want to use is something more like Linux's fexecve(). IOW we always first use openat() to acquire the binary to execute.

sunfishcode commented 2 years ago

This take_fd_n looks like a step in a useful direction, though I'd like to brainstorm a bit more around the target: i32 part.

Here's a very rough sketch of the kind of thing I'm thinking about: https://github.com/sunfishcode/intercomm

In particular, see examples/fun.rs. The goal is to make passing file descriptors as easy as passing strings, where users don't need to manually allocate child file descriptor indices.

The code above works with Command which takes a plain path, but I agree that some form of fexecve/execveat is nicer. Perhaps a cap_std version of Command is the place to support that.

cgwalters commented 2 years ago

Hmm. Some overlap with https://docs.rs/procspawn/0.10.0/procspawn/ (cool crate if you aren't aware of it). It also seems like most users would want at least a serde feature so they can pass anything serializable and not just a hardcoded set? Or dunno, maybe just integers and strings and file descriptors is sensible. (Passing floating point values into CLI apps seems fairly obscure to me)

I definitely love the idea of making it foolproof to pass the file descriptor and the string value with its number as a single unit from the caller's PoV. I have a small concern that there might be programs that want the fd in a special way. At least for my particular use case it looks like --sockfd=N or --sockfd N (i.e. "normal" CLI style) so the latter would work with this.

I'm a bit uncertain; this approach clearly has more safety, but there's a lot more logic (and opinions) going on, and it seems to me it could make sense to at least offer something close to my proposed lower level API which is still arguably safe?

cgwalters commented 2 years ago

For now I created https://github.com/cgwalters/cap-std-ext - I'm thinking to centralize more things there too as a staging ground. (xref https://github.com/bytecodealliance/cap-std/issues/211 too)

sunfishcode commented 2 years ago

That makes sense for now. I think I'm getting bogged down here because this whole topic ties into some big-picture goals for me.

One off the things that's tricky is that after the user calls take_fd_n, they have to somehow arrange for the child process to know which file descriptor to open, and do the equivalent of from_raw_fd on it, which is unsafe. To get rid of that unsafe, one option is to have a command-line parser that the child could use that could deserialize into something like InterTypeable values in a way that could encapsulate the unsafety. So this is leading me to think about serde-like APIs, and perhaps there's a serde compat layer, but it would need to ultimately be a separate system because serde's data model doesn't have handles.

cgwalters commented 2 years ago

Yeah. In my case though the child process is Go, not Rust. So...none of that applies unfortunately.

(If you are idly curious, this is used by https://github.com/containers/containers-image-proxy-rs/ which is part of my active project in pulling operating system updates via the containers/image stack, and the host process here is increasingly Rust with a large unfortunate swath of C/C++, and I definitely didn't want to add Go into the address space too :wink: )

cgwalters commented 2 years ago

but it would need to ultimately be a separate system because serde's data model doesn't have handles.

Tangentially related, https://docs.rs/ipc-channel/ supports adding channels into messages over its own channels. I didn't deep dive but I think this works by adding it to an out-of-band queue; since the crate is also responsible for serializing and doing writing, it can handle that too.

cgwalters commented 2 years ago

OK https://crates.io/crates/cap-std-ext exists now on crates.io. I also worked on an O_TMPFILE API.

What do you think? Basically I (or we, and others) can iterate on some higher level APIs there, and then migrate them down into cap-std where it makes sense.

sunfishcode commented 2 years ago

Looks good! And yes, I think this makes sense. There are a lot of utility functions that could potentially go in such a crate, so let's see where it goes.

bytecodealliance / rustix

std::process::Command and I/O safety #100