Open nico opened 2 years ago
Having thought about this a bit, I am curious about the "spawning subprocesses takes some time" part. If it's a skip-a-frame kind of latency, like up to 100ms occasionally, then it's probably fine to serialize it in the main process, though I guess that would interact poorly with -j1000. If it's a multi second kind of thing then definitely it needs a different approach.
I'd probably just go with a mutex around calling Command::spawn (instead of ::output) for now, with an eye towards rewriting to ppoll eventually (or tokio maybe, but I'm kinda skeptical of async Rust).
Command::spawn() in the rust stdlib unconditionally calls anon_pipe here
I think it is unlikely that FDs from this pipe are leaked.
Both ends are short lived and explicitly closed: the write end is closed immediately after fork() returns, the read end is closed one as soon as the child has called exec().
So the only situation it could cause a problem is when the process literally has 100s of forked-but-not-yet-exec'ed children. Theoretically possible, but it sounds like a stretch to me.
Just throwing it out there: libstd uses posix_spawn() under the right conditions (instead of fork + execve) and macOS has a POSIX_SPAWN_CLOEXEC_DEFAULT attribute that does what its name suggests. Teaching libstd about it probably isn't too hard.
we made it async in chromium at some point for that reason
@nico I'm interested, can you say more? My understanding is that as long as you're using fork, there's no getting away from paying the cost of copying the working set.
I think it is unlikely that FDs from this pipe are leaked.
It's possible this theory is wrong. Background is that n2 needs way more FDs than ninja (ninja -j250
works fine on my mac, n2 -j250
dies with "Too many open files (os error 24"), and I compared fs_usage logs.
I ran something like sudo fs_usage -w -f filesys| grep n2 > fs-n2.txt
and then ran n2 -j50
for the build file generated by that python script. Then I did the same with ninja, and compared the two txt files.
This was the main difference that stood out, and there is a race there. So it seems plausible to me, but it's well possible this is not the root cause. (It does fit with things seemingly working fine on Linux -- but Linux also has a higher FD limit.)
@nico I'm interested, can you say more?
Turns out I was confusing things! We didn't do this in chromium, we were talking about doing this in ninja, here: https://bugs.chromium.org/p/chromium/issues/detail?id=829710#c3 (but as far as I know didn't end up merging it)
IIRC the background there was that someone was running ninja on a Windows system with some thing installed that made process creation very slow. But if you're in that situation, your build is going to be slow anyways and it's imho not necessarily something tools should go out of their way to mitigate.
So it's probably fine to just start processes on the main thread.
WIth that patch applied, and the build.ninja file generated by the script in https://github.com/evmar/n2/issues/14#issuecomment-1084646807, I can run n2 -j250
and it works just as well as ninja.
I can also build llvm with n2 -j250
using goma, just like with ninja.
(I also filed an upstream issue.)
I think 9c2104a3ca3b6225b126ac193213c2642e1a7147 regressed this. The test in https://github.com/evmar/n2/issues/14#issuecomment-1084646807 repros this again.
(I don't have permissions to reopen this issue here.)
Command::spawn()
in the rust stdlib unconditionally callsanon_pipe
here: https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/process/process_unix.rs#L62anon_pipe
on Linux callspipe2
to setCLOEXEC
on the pipe atomically: https://github.com/rust-lang/rust/blob/521734787ecf80ff12df7ca5998f7ec0b3b7b2c9/library/std/src/sys/unix/pipe.rs#L18But macOS has no
pipe2
, so here the stdlib instead callspipe()
followed byset_cloexec
: https://github.com/rust-lang/rust/blob/521734787ecf80ff12df7ca5998f7ec0b3b7b2c9/library/std/src/sys/unix/pipe.rs#L35This means there's a window where the pipe is created but cloexec isn't set on the pipe's FDs yet. If a different thread forks in that window, the pipe's fds get leaked.
(Not that it matters, but
set_cloexec
is here https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/fd.rs#L204 . On macOS and a few other OSs, it callsioctl
withFIOCLEX
, which is just 1 syscall. On Linux and a few others, it callsfcntl
withF_GETFD
/F_SETFD
which is 2 syscalls, but we don't need to callset_cloexec
at all in this path on Linux.)This is likely why
n2 -j250
runs out of FDs on macOS, whileninja -j250
works fine.Possible fixes:
Don't fork on background threads. Create subprocess on main thread, and then hand the created subprocess to background thread for waiting on it and to do deps processing
Don't use threads per subprocesses at all, use a select loop model instead (but still use threads for deps processing)
Don't use the
Command
object on non-Windows either (we already don't on Windows), and add mutexes aroundfork
andpipe/set_cloexec
Spawning subprocesses takes some time (
we made it async in chromium at some point for that reasonwe looked at making it async in ninja for that reason), so moving it to the main thread would add some amount of work on to the critical path of n2.