Closed bturner closed 3 years ago
I'm hoping to put together a pull request (or pull requests) for these issues in the next couple days.
The Linux issue with leaked pipe descriptors has actually caused an outage for us in production internally, as well as for a fairly large customer. In both cases the system ran out of file descriptors through no direct fault of NuProcess's; Bitbucket Server was simply running too many concurrent processes, but once that happens it's basically impossible for the system to ever recover without a restart because each new failure to start a process leaks 3 more pipe handles. Similar situations with ProcessBuilder
can recover on their own; once the load subsides and the number of concurrent processes drops, things are stable again. With the pipe leaks, even as load declines the number of free descriptors doesn't increase.
Raised pull request #120 for the Linux pipe part of the fix. The issues with macOS and Windows are much more subtle (on Windows I'm actually not certain there are any issues, at this point), and would involve failures of system calls other than posix_spawnp
or CreateProcessW
, so I'm not certain they're worth the complexity of the changes it would take to fix them.
The exact codepaths vary, but for every supported platform there are cases where a failure to start a process leaks descriptors or handles.
For example, on Linux, if
forkAndExec
throws an exception,closePipes()
is never called and the 3*Widow
descriptors are leaked. You can confirm this with NuProcess's own test suite:RunTest.noExecutableFound()
after the call topb.start()
lsof
on thejava
processAt the end of the
lsof
output will be something like the following (the IDs will likely vary, of course)If you use the debugger to manually call
process.closePipes()
and then runlsof
again, you'll see that the 3 pipes are closed.That specific codepath is handled correctly on macOS and all the pipes are closed, but if there are any errors in
createPosixPipes()
after it callscreatePipes()
, sincecreatePosixPipes()
is called outside thetry
/catch
inrun
andstart
, the pipes are leaked. (This case is much less likely that the Linux one, to be clear; I'm just noting its an edge case that exists.)WindowsProcess
has similar issues with handles it creates increatePipes()
.