Using a large buffer size for the pipe connecting the child process can result in IOCP imposing (very short) delays before notifying the port when a read completes, waiting in case additional data arrives. For a small number of reads the short delays are no problem, but for processes that produce hundreds of megabytes of output (especially if they do so in small bursts, like git http-backend and git upload-pack) all those short delays accumulate into a massive performance hit, reducing throughput compared to ProcessBuilder by an order of magnitude (30MB/s+ down to 2MB/s).
Using a smaller pipe buffer does impose some constraint on how much data can be moved by a single ReadFile or WriteFile call but it also prevents delays (because there's no point in waiting for more input if the buffer is already full), resulting in overall improved--and, more importantly, more consistent--performance.
Reduced WindowsProcess.BUFFER_SIZE from 64K to 4096+24, which matches the buffer size the JDK's ProcessImpl uses on Windows
Replaced System.err with a Logger when writing errors
This matches how LinuxProcess and OsxProcess are written
Optimized HANDLE.fromNative to check the pointer directly, to avoid creating HANDLE instances for invalid handles
This also avoids the base implementation's use of reflection to instantiate new HANDLE instances
Updated PipeBundle to set auto-sync to false on OVERLAPPED instances so JNA won't waste time marshaling to/from native code around every call to ReadFile or WriteFile
Using a large buffer size for the pipe connecting the child process can result in IOCP imposing (very short) delays before notifying the port when a read completes, waiting in case additional data arrives. For a small number of reads the short delays are no problem, but for processes that produce hundreds of megabytes of output (especially if they do so in small bursts, like
git http-backend
andgit upload-pack
) all those short delays accumulate into a massive performance hit, reducing throughput compared toProcessBuilder
by an order of magnitude (30MB/s+ down to 2MB/s).Using a smaller pipe buffer does impose some constraint on how much data can be moved by a single
ReadFile
orWriteFile
call but it also prevents delays (because there's no point in waiting for more input if the buffer is already full), resulting in overall improved--and, more importantly, more consistent--performance.WindowsProcess.BUFFER_SIZE
from 64K to 4096+24, which matches the buffer size the JDK'sProcessImpl
uses on WindowsSystem.err
with aLogger
when writing errorsLinuxProcess
andOsxProcess
are writtenHANDLE.fromNative
to check the pointer directly, to avoid creatingHANDLE
instances for invalid handlesHANDLE
instancesPipeBundle
to set auto-sync to false onOVERLAPPED
instances so JNA won't waste time marshaling to/from native code around every call toReadFile
orWriteFile