Closed bturner closed 3 years ago
I believe the reason reducing the pipe buffer size in WindowsProcess
fixes the performance problem we're seeing is because once the pipe buffer is full there's no reason to wait for any more input--there's no room for it anyway. With that (small) delay avoided, the overall operation ends up being much, much faster despite needing to move more buffers.
When used to pump processes that produce lots of output in small chunks (which happens a lot with
git
commands--especiallygit http-backend
andgit upload-pack
, which are used to serve clones and fetches), the existing I/O completion handler code inProcessCompletions
runs afoul of how IOCP decides when to signal ready reads. When some data is available to read, IOCP waits (a very short wait) to see if more data arrives before triggering. For short-lived processes, or processes that produce their output in big chunks, that works fine. But if the process produces output in many, many tiny pieces, that extra delay amounts to a huge performance hit.This came up for Bitbucket Server in BSERV-12599, where it was reported that, since we switched over to NuProcess to run
git http-backend
andgit upload-pack
, hosting operations on Windows are an order of magnitude slower than they were usingProcessBuilder
and blocking I/O.To give a sense of scale, cloning a 500MB repository (Bitbucket Server's own source) via Bitbucket Server using
ProcessBuilder
looks like this:480MB at 34MB/s, with the entire operation taking about 25 seconds (the 34MB/s transfer is only part of the overall time).
Switching over to NuProcess completely tanks performance:
We've dropped from 34MB/s to 2MB/s, and the overall operation now takes over 3 minutes. For larger repositories the difference is even more painful, taking clones that previously ran in 30-60 seconds and blowing them out to 10-15 minutes. That results in stacking load on Bitbucket Server that eventually causes rejected requests due to excessive queuing.
I stripped out all Bitbucket Server's code and wrote a test in NuProcess that runs
git http-backend
directly, with the rightstdin
and environment to produce the same effective operation. (Unfortunately this test isn't really shareable because it relies on some cannedstdin
I captured, as well as access to a specific Git repository.) With that test, I'm able to reproduce the performance issue without any Bitbucket Server code at all. (It's worth noting that the test executed on Linux or macOS performs fine, with NuProcess speeds essentially identical toProcessBuilder
.)In trying to track down the issue, I looked through the JDK's source and found they use
4096 + 24
byte buffers for their pipes. ChangingWindowsProcess.BUFFER_SIZE
from 64K to4096 + 24
fixes the issue and produces identical throughput with NuProcess compared toProcessBuilder
.A colleague helping me search for this found some other cases where IOCP's Nagle-like approach has caused problems: