Reduce pipe buffer size on Windows (fixes #117).

Using a large buffer size for the pipe connecting the child process can result in IOCP imposing (very short) delays before notifying the port when a read completes, waiting in case additional data arrives. For a small number of reads the short delays are no problem, but for processes that produce hundreds of megabytes of output (especially if they do so in small bursts, like git http-backend and git upload-pack) all those short delays accumulate into a massive performance hit, reducing throughput compared to ProcessBuilder by an order of magnitude (30MB/s+ down to 2MB/s).

Using a smaller pipe buffer does impose some constraint on how much data can be moved by a single ReadFile or WriteFile call but it also prevents delays (because there's no point in waiting for more input if the buffer is already full), resulting in overall improved--and, more importantly, more consistent--performance.

Reduced WindowsProcess.BUFFER_SIZE from 64K to 4096+24, which matches the buffer size the JDK's ProcessImpl uses on Windows
Replaced System.err with a Logger when writing errors
- This matches how LinuxProcess and OsxProcess are written
Optimized HANDLE.fromNative to check the pointer directly, to avoid creating HANDLE instances for invalid handles
- This also avoids the base implementation's use of reflection to instantiate new HANDLE instances
Updated PipeBundle to set auto-sync to false on OVERLAPPED instances so JNA won't waste time marshaling to/from native code around every call to ReadFile or WriteFile

brettwooldridge / NuProcess

Reduce pipe buffer size on Windows (fixes #117). #118