Latency increases when threads > 1

Hi there,

I'm trying to understand why I would be seeing a jump in latency when the thread count is greater than 1.

On an c5.2xlarge ec2 instance with 4 physical CPUs (8 logical)

1 thread:

$ ./wrk -t1 -d30s -c100 -R100 http://myhost
Running 30s test @ http://myhost
  1 threads and 100 connections
  Thread calibration: mean lat.: 24.837ms, rate sampling interval: 57ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    23.23ms    4.27ms  49.63ms   84.20%
    Req/Sec    98.30     96.55   214.00     23.68%
  3001 requests in 30.02s, 832.31KB read
Requests/sec:     99.96
Transfer/sec:     27.72KB

4 threads:

$ ./wrk -t4 -d30s -c100 -R100 http://myhost
Running 30s test @ http://myhost
  4 threads and 100 connections
  Thread calibration: mean lat.: 189.007ms, rate sampling interval: 647ms
  Thread calibration: mean lat.: 341.843ms, rate sampling interval: 904ms
  Thread calibration: mean lat.: 336.302ms, rate sampling interval: 892ms
  Thread calibration: mean lat.: 343.629ms, rate sampling interval: 894ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   315.79ms  115.94ms 506.88ms   53.75%
    Req/Sec    24.92     10.10    38.00     70.83%
  2975 requests in 30.13s, 825.10KB read
Requests/sec:     98.74
Transfer/sec:     27.39KB

8 threads:

$ ./wrk -t8 -d30s -c100 -R100 http://myhost
Running 30s test @ http://myhost
  8 threads and 100 connections
  Thread calibration: mean lat.: 323.346ms, rate sampling interval: 902ms
  Thread calibration: mean lat.: 321.696ms, rate sampling interval: 901ms
  Thread calibration: mean lat.: 329.348ms, rate sampling interval: 895ms
  Thread calibration: mean lat.: 324.543ms, rate sampling interval: 912ms
  Thread calibration: mean lat.: 324.012ms, rate sampling interval: 907ms
  Thread calibration: mean lat.: 330.061ms, rate sampling interval: 910ms
  Thread calibration: mean lat.: 332.053ms, rate sampling interval: 914ms
  Thread calibration: mean lat.: 333.218ms, rate sampling interval: 910ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   335.85ms  129.11ms 488.96ms   79.56%
    Req/Sec    12.25      1.72    16.00     86.05%
  2994 requests in 30.06s, 830.37KB read
Requests/sec:     99.59
Transfer/sec:     27.62KB

I get similar results when I run the test for longer (5 mins). I observe the same latencies as measured by the server, so I wonder is there something different with the way multi-threaded load is generated?

giltene / wrk2

Latency increases when threads > 1 #62