caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
58.39k stars 4.04k forks source link

Performance of reverse proxy middleware #939

Closed tomasdeml closed 8 years ago

tomasdeml commented 8 years ago

1. What version of Caddy are you running (caddy -version)?

v0.8.3 / v0.9.0 OS Windows Server 2012 R2 running on Azure instance 'Standard D3 v2' (4 cores, 14 GB memory)

2. What are you trying to do?

Measure performance of caddy reverse proxy middleware using Apache Bench.

3. What is your entire Caddyfile?

caddyfile_proxy:

http:// {
    errors CaddyErrors.log
    header / {
        -Server
    }
    proxy / localhost:580 {
        proxy_header Host {host}
        proxy_header X-Real-IP {remote}
        proxy_header X-Forwarded-Proto {scheme}
    }
}

caddyfile_upstream:

http://:580 {
    errors CaddyErrors_upstream.log
    header / {
        -Server
    }
    root WebRoot
}

The WebRoot folder contains file index.html:

<!DOCTYPE html><html><head><meta charset="utf-8"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="width=device-width,initial-scale=1"><title>1234567 12345</title></head><body></body></html>

4. How did you run Caddy (give the full command and describe the execution environment)?

caddy.exe -conf=Caddyfile_proxy

and then in a new shell:

caddy.exe -conf=Caddyfile_upstream

Rationale We would like to use caddy as a simple reverse proxy for one of our backend services. To measure performance of the proxy, I used Apache Bench and got sub-optimal results. For the benchmark I created a setup with one caddy instance acting as the proxy and another caddy instance representing the backend (upstream).

To establish a baseline for caddy performance, I ran ab.exe -n 1000000 -c 1000 -k http://remote-machine-running-caddy:580/ from another machine against caddy v0.8.3. I got the following result (one of three runs, the other results were pretty similar):

This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking *** (be patient) 

Server Software:        
Server Hostname:        ***
Server Port:            580

Document Path:          /
Document Length:        224 bytes

Concurrency Level:      1000
Time taken for tests:   90.600 seconds
Complete requests:      1000000
Failed requests:        0
Keep-Alive requests:    1000000
Total transferred:      433000000 bytes
HTML transferred:       224000000 bytes
Requests per second:    11037.47 [#/sec] (mean)
Time per request:       90.600 [ms] (mean)
Time per request:       0.091 [ms] (mean, across all concurrent requests)
Transfer rate:          4667.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0      18
Processing:     0   90  15.3     94     692
Waiting:        0   90  15.3     94     692
Total:          0   90  15.3     94     692

Percentage of the requests served within a certain time (ms)
  50%     94
  66%     94
  75%     94
  80%     94
  90%    109
  95%    109
  98%    125
  99%    125
 100%    692 (longest request)

Then I started the proxy and ran ab.exe -n 1000000 -c 1000 -k http://remote-machine-running-caddy/. I got the following result (again one of three runs):

Server Software:        
Server Hostname:        ***
Server Port:            80

Document Path:          /
Document Length:        224 bytes

Concurrency Level:      1000
Time taken for tests:   236.380 seconds
Complete requests:      1000000
Failed requests:        0
Keep-Alive requests:    1000000
Total transferred:      448000000 bytes
HTML transferred:       224000000 bytes
Requests per second:    4230.47 [#/sec] (mean)
Time per request:       236.380 [ms] (mean)
Time per request:       0.236 [ms] (mean, across all concurrent requests)
Transfer rate:          1850.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0      20
Processing:     0  236 1088.9    203   62405
Waiting:        0  236 1088.9    203   62405
Total:          0  236 1088.9    203   62405

Percentage of the requests served within a certain time (ms)
  50%    203
  66%    203
  75%    204
  80%    219
  90%    234
  95%    250
  98%    283
  99%    328
 100%  62405 (longest request)

I re-ran the tests against Caddy v0.9.0 and got even worse results. Without proxy:

Server Software:        
Server Hostname:        ***
Server Port:            580

Document Path:          /
Document Length:        224 bytes

Concurrency Level:      1000
Time taken for tests:   147.463 seconds
Complete requests:      1000000
Failed requests:        0
Keep-Alive requests:    1000000
Total transferred:      456000000 bytes
HTML transferred:       224000000 bytes
Requests per second:    6781.37 [#/sec] (mean)
Time per request:       147.463 [ms] (mean)
Time per request:       0.147 [ms] (mean, across all concurrent requests)
Transfer rate:          3019.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0      18
Processing:     0  147  65.4    141    1419
Waiting:        0  147  65.4    141    1419
Total:          0  147  65.4    141    1419

Percentage of the requests served within a certain time (ms)
  50%    141
  66%    156
  75%    172
  80%    177
  90%    205
  95%    250
  98%    322
  99%    368
 100%   1419 (longest request)

Unfortunately I could not execute a run with proxy as it did not complete (see Failed requests), most likely because of issue #938:

Server Software:        
Server Hostname:        ***
Server Port:            80

Document Path:          /
Document Length:        224 bytes

Concurrency Level:      1000
Time taken for tests:   90.625 seconds
Complete requests:      1000000
Failed requests:        970161
   (Connect: 0, Receive: 0, Length: 970161, Exceptions: 0)
Non-2xx responses:      970161
Keep-Alive requests:    1000000
Total transferred:      207116208 bytes
HTML transferred:       22206512 bytes
Requests per second:    11034.47 [#/sec] (mean)
Time per request:       90.625 [ms] (mean)
Time per request:       0.091 [ms] (mean, across all concurrent requests)
Transfer rate:          2231.85 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0      19
Processing:     0   90 186.2     49    3207
Waiting:        0   90 186.2     49    3207
Total:          0   90 186.2     49    3207

Percentage of the requests served within a certain time (ms)
  50%     49
  66%     94
  75%     95
  80%    109
  90%    125
  95%    150
  98%    406
  99%    970
 100%   3207 (longest request)

Is this kind of performance expected?

tomasdeml commented 8 years ago

I realised the poor performance may be due to CPU contention between two caddy instances so I have repeated the tests against caddy_upstream being on another machine. The results are better, however the performance drop is still about 50%.

Baseline results for caddy_upstream v0.8.3 without proxy, command .\ab.exe -n 50000 -c 1000 -k http://remote-machine-upstream/ (note reduced concurrency and request number because of issue #938):

Concurrency Level:      1000
Time taken for tests:   5.597 seconds
Complete requests:      50000
Failed requests:        0
Keep-Alive requests:    50000
Total transferred:      21650000 bytes
HTML transferred:       11200000 bytes
Requests per second:    8932.93 [#/sec] (mean)
Time per request:       111.945 [ms] (mean)
Time per request:       0.112 [ms] (mean, across all concurrent requests)
Transfer rate:          3777.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0      16
Processing:     0  106  33.6    109     535
Waiting:        0  106  33.6    109     535
Total:          0  106  33.6    109     535

Percentage of the requests served within a certain time (ms)
  50%    109
  66%    109
  75%    109
  80%    109
  90%    109
  95%    110
  98%    125
  99%    267
 100%    535 (longest request)

And results with proxy, command .\ab.exe -n 50000 -c 1000 -k http://remote-machine-running-caddy/:

Concurrency Level:      1000
Time taken for tests:   11.020 seconds
Complete requests:      50000
Failed requests:        0
Keep-Alive requests:    50000
Total transferred:      22400000 bytes
HTML transferred:       11200000 bytes
Requests per second:    4537.37 [#/sec] (mean)
Time per request:       220.392 [ms] (mean)
Time per request:       0.220 [ms] (mean, across all concurrent requests)
Transfer rate:          1985.10 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0      16
Processing:    16  162  92.4    156    3304
Waiting:       16  162  92.4    156    3304
Total:         16  162  92.4    156    3304

Percentage of the requests served within a certain time (ms)
  50%    156
  66%    187
  75%    203
  80%    207
  90%    240
  95%    273
  98%    375
  99%    461
 100%   3304 (longest request)

Edit Results for caddy v0.9.0: Baseline without proxy:

Concurrency Level:      1000
Time taken for tests:   8.428 seconds
Complete requests:      50000
Failed requests:        0
Keep-Alive requests:    50000
Total transferred:      22800000 bytes
HTML transferred:       11200000 bytes
Requests per second:    5932.95 [#/sec] (mean)
Time per request:       168.550 [ms] (mean)
Time per request:       0.169 [ms] (mean, across all concurrent requests)
Transfer rate:          2642.02 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0      16
Processing:     0  162  81.0    156    2728
Waiting:        0  162  81.0    156    2728
Total:          0  162  81.0    156    2728

Percentage of the requests served within a certain time (ms)
  50%    156
  66%    172
  75%    188
  80%    203
  90%    235
  95%    285
  98%    358
  99%    404
 100%   2728 (longest request)

And with proxy:

Concurrency Level:      1000
Time taken for tests:   26.484 seconds
Complete requests:      50000
Failed requests:        0
Keep-Alive requests:    50000
Total transferred:      23550000 bytes
HTML transferred:       11200000 bytes
Requests per second:    1887.95 [#/sec] (mean)
Time per request:       529.675 [ms] (mean)
Time per request:       0.530 [ms] (mean, across all concurrent requests)
Transfer rate:          868.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0  19.0      0    3009
Processing:    16  404 653.0    313   12093
Waiting:       16  404 653.0    313   12093
Total:         16  404 653.6    313   12093

Percentage of the requests served within a certain time (ms)
  50%    313
  66%    359
  75%    394
  80%    418
  90%    473
  95%    531
  98%    740
  99%   3317
 100%  12093 (longest request)

Ping between bench, proxy and upstream machines is max 2 ms.

abiosoft commented 8 years ago

Mind using https://github.com/wg/wrk or https://github.com/rakyll/boom ?

tomasdeml commented 8 years ago

@abiosoft Thank you for suggesting other tools, I know ab is not exactly state of the art. I repeated tests with boom (again with proxy and upstream being different machines) and the performance differences look relatively the same.

Baseline for Caddy v0.8.3 without proxy (.\boom.exe -n 50000 -c 1000 http://remote-machine-upstream/):

Summary:
  Total:    5.5015 secs
  Slowest:  5.2515 secs
  Fastest:  0.0000 secs
  Average:  0.1024 secs
  Requests/sec: 9088.4154
  Total data:   11200000 bytes
  Size/request: 224 bytes

Status code distribution:
  [200] 50000 responses

Response time histogram:
  0.000 [1536]  |∎
  0.525 [48399] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  1.050 [7] |
  1.575 [6] |
  2.101 [5] |
  2.626 [6] |
  3.151 [5] |
  3.676 [5] |
  4.201 [5] |
  4.726 [5] |
  5.252 [21]    |

Latency distribution:
  10% in 0.0625 secs
  25% in 0.0937 secs
  50% in 0.0938 secs
  75% in 0.1094 secs
  90% in 0.1108 secs
  95% in 0.1406 secs
  99% in 0.2812 secs

And with proxy (.\boom.exe -n 50000 -c 1000 http://remote-machine-with-caddy/):

Summary:
  Total:    11.3033 secs
  Slowest:  9.0447 secs
  Fastest:  0.0000 secs
  Average:  0.1403 secs
  Requests/sec: 4423.4731
  Total data:   11200000 bytes
  Size/request: 224 bytes

Status code distribution:
  [200] 50000 responses

Response time histogram:
  0.000 [1790]  |∎
  0.904 [47981] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  1.809 [0] |
  2.713 [0] |
  3.618 [13]    |
  4.522 [215]   |
  5.427 [0] |
  6.331 [0] |
  7.236 [0] |
  8.140 [0] |
  9.045 [1] |

Latency distribution:
  10% in 0.0312 secs
  25% in 0.0781 secs
  50% in 0.1250 secs
  75% in 0.1719 secs
  90% in 0.2031 secs
  95% in 0.2343 secs
  99% in 0.3906 secs

Baseline Caddy v0.9.0 without proxy:

Summary:
  Total:    10.1248 secs
  Slowest:  9.8588 secs
  Fastest:  0.0000 secs
  Average:  0.1360 secs
  Requests/sec: 4938.3901
  Total data:   11200000 bytes
  Size/request: 224 bytes

Status code distribution:
  [200] 50000 responses

Response time histogram:
  0.000 [5909]  |∎∎∎∎∎
  0.986 [43676] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  1.972 [10]    |
  2.958 [47]    |
  3.944 [38]    |
  4.929 [7] |
  5.915 [67]    |
  6.901 [31]    |
  7.887 [93]    |
  8.873 [0] |
  9.859 [122]   |

Latency distribution:
  25% in 0.0156 secs
  50% in 0.0781 secs
  75% in 0.1250 secs
  90% in 0.1562 secs
  95% in 0.1875 secs
  99% in 0.3947 secs

And with proxy:

Summary:
  Total:    23.5891 secs
  Slowest:  11.4876 secs
  Fastest:  0.0000 secs
  Average:  0.2954 secs
  Requests/sec: 2119.6269
  Total data:   11200000 bytes
  Size/request: 224 bytes

Status code distribution:
  [200] 50000 responses

Response time histogram:
  0.000 [642]   |
  1.149 [48942] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.298 [96]    |
  3.446 [26]    |
  4.595 [46]    |
  5.744 [27]    |
  6.893 [12]    |
  8.041 [15]    |
  9.190 [15]    |
  10.339 [15]   |
  11.488 [164]  |

Latency distribution:
  10% in 0.0797 secs
  25% in 0.1718 secs
  50% in 0.2344 secs
  75% in 0.3125 secs
  90% in 0.4062 secs
  95% in 0.4687 secs
  99% in 0.7049 secs
princemaple commented 8 years ago

does https://github.com/mholt/caddy/pull/880 play any role in this?

nemosupremo commented 8 years ago

@tomasdeml

With #984, you should be able to pass the ab benchmarks by increasing the keepalive directive in your proxy. By default it is 2, and you should increase this depending on how many concurrent connections you expect. I'm not 100% sure what the correct value is here (I'm not super familiar with the pooling code in net/http/transport.go) so it will take some testing.

Feel free to reopen this - if you do, I'd also like to see some tests with nginx on the same hardware.