Shopify / pitchfork

Other
686 stars 21 forks source link

Implement listen queues for fairer load balancing #127

Closed casperisfine closed 4 months ago

casperisfine commented 5 months ago

Closes: https://github.com/Shopify/pitchfork/issues/71

Linux's epoll+accept queue is fundamentally LIFO (see a good writeup at https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/).

Because of this, both Unicorn and Pitchfork don't peroperly balance load between workers, unless the deployment is at capacity, the first workers will handle disproportionately more work.

For example, unning: wrk -c 4 -t 4 'http://localhost:8080/'

With the master branch and config:

listen 8080
worker_processes 16

Whos a big imbalance in the number of requests handled by each worker:

worker[0] - requests: 49131
worker[1] - requests: 46997
worker[2] - requests: 44023
worker[3] - requests: 38420
worker[4] - requests: 14945
worker[5] - requests: 1742
worker[6] - requests: 91
worker[7] - requests: 12
worker[8] - requests: 0
worker[9] - requests: 0
worker[10] - requests: 0
worker[11] - requests: 1
worker[12] - requests: 0
worker[13] - requests: 0
worker[14] - requests: 0
worker[15] - requests: 0

In some ways this behavior can be useful, but in other it may be undesirable. Most notably in can create a situation where some of the workers are only used when there is a spike of traffic, and when that spike happen, it hit colder workers.

To work around this issue, we can create multiple file descriptors for a single port, and limit each worker to a subset of the file descriptors. Linux will then round robin incoming requests between

Running the same benchmark with this branch and config:

listen 8080, queues: 8, queues_per_worker: 2
worker_processes 16

Result:

worker[0] - requests: 31580
worker[1] - requests: 25191
worker[2] - requests: 23575
worker[3] - requests: 22915
worker[4] - requests: 23101
worker[5] - requests: 23020
worker[6] - requests: 22948
worker[7] - requests: 14415
worker[8] - requests: 4084
worker[9] - requests: 2046
worker[10] - requests: 1774
worker[11] - requests: 1638
worker[12] - requests: 1755
worker[13] - requests: 1604
worker[14] - requests: 1160
worker[15] - requests: 179

The above example still doesn't do a perfectly fair load balancing, but that could be acheived by creating even more queues. The goal however isn't to have perfectly fair load balancing, simply to ensure every worker has a chance to do some minimal warmup.