Closed kostyrev closed 4 years ago
How often did this happen and how many failures did you see? What was the CPU load during the test?I haven't seen this error but my guess is that you should see some aborted connections/requests on the upstream servers as well. Could you run the same test on the HTTP port as well?
I see those errors every time I run wrk
.
over https
top shows 33.0%
there's one error of 2017/04/14 22:08:38 http: TLS handshake error from 188.123.241.55:12061: EOF
and 13-14 errors of 2017/04/14 22:09:38 http: proxy error: context canceled
over http
top show 27~31%
16-22 errors of 2017/04/14 22:14:58 http: proxy error: context canceled
I see no errors in nginx error.log on upstream servers.
I've got the same error during my bench with vegeta tools ...
https://github.com/tsenart/vegeta
1500req/s , 10 workers during 3minutes. A lot of errors like this :
2017/05/23 11:49:07 http: proxy error: context canceled 2017/05/23 11:49:07 http: proxy error: context canceled 2017/05/23 11:49:07 http: proxy error: context canceled 2017/05/23 11:49:07 http: TLS handshake error from 184.13.64.47:46838: EOF 2017/05/23 11:49:07 http: TLS handshake error from 184.13.64.47:59158: EOF 2017/05/23 11:49:08 http: proxy error: context canceled 2017/05/23 11:49:08 http: proxy error: context canceled 2017/05/23 11:49:08 http: TLS handshake error from 184.13.64.47:39837: EOF 2017/05/23 11:49:09 http: TLS handshake error from 184.13.64.47:51372: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:46966: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:45049: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:36919: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:40404: EOF 2017/05/23 11:49:13 [INFO] consul: Health changed to #308644 2017/05/23 11:49:18 http: TLS handshake error from 184.13.64.47:53264: EOF 2017/05/23 11:49:18 http: TLS handshake error from 184.13.64.47:54748: EOF 2017/05/23 11:49:19 http: TLS handshake error from 184.13.64.47:35223: EOF 2017/05/23 11:49:19 http: TLS handshake error from 184.13.64.47:52443: EOF 2017/05/23 11:49:20 http: TLS handshake error from 184.13.64.47:36657: EOF
1 fabio, listening in HTTPS, 5 backend (just 5 nginx with a hello world) serving in HTTP (fabio soes the SSL terminaison)
See the error in vegeta :
Requests [total, rate] 270000, 1500.01 Duration [total, attack, wait] 3m23.559738851s, 2m59.999153185s, 23.560585666s Latencies [mean, 50, 95, 99, max] 9.012213368s, 4.520873ms, 38.542513598s, 45.01141293s, 1m0.000954494s Bytes In [total, mean] 105788484, 391.81 Bytes Out [total, mean] 0, 0.00 Success [ratio] 64.02% Status Codes [code:count] 200:172857 0:96521 502:622 Error Set: Get https://192.163.167.143:10443/: dial tcp 0.0.0.0:0->192.163.167.143:10443: bind: address already in use 502 Bad Gateway Get https://192.163.167.143:10443/: net/http: TLS handshake timeout Get https://192.163.167.143:10443/: net/http: timeout awaiting response headers Get https://192.163.167.143:10443/: dial tcp 0.0.0.0:0->192.163.167.143:10443: i/o timeout
Nginx use standard configuration (i can maybe tune it a little ...) but only 300req/s per NGINX so not so much.
Fabio use 102400 value for nofile and 32768 for nproc and this :
sudo sysctl -w fs.file-max="9999999" sudo sysctl -w fs.nr_open="9999999"
I do not know if the client failed (32 cores and 512GB of RAM for running the benchmark), or fabio or nginx ... nor explicit log in nginx
We see the "context cancelled" at a rate of about 3% of all requests. "TLS handshake error" at about 0.3%.
CPU load is negligable (couple %), RAM usage ~50MB.
+1
I'm also noticing lots of context cancelled messages in the error log and they are mostly correlated with requests being logged with Response 502 in the access log.
@magiconair do you know if the context cancelled error can be causing lots of 502 in our access logs and why? I've done some research and people have this issue in the Caddy proxy server as well and it seems to be related to clients closing the connection prematurely.
From my understanding, 502s should be thrown only if Fabio is unable to connect to upstreams which I believe is not the case in our environment.
https://caddy.community/t/error-context-canceled/2034 https://github.com/mholt/caddy/issues/1345 https://github.com/mholt/caddy/issues/1925
@danlsgiga I'm not sure if related but in our scenario Fabio can connect to the upstreams fine, see #673
@scalp42 in our scenario too, its just misleading logs!
while benchmarking fabio I've noticed
in logs. it is t2.micro AWS instance that was tuned with those changes:
could you please elaborate on those errors in logs? I used wrk
behind balancer there were two
t2.medium
instances with nginx installed.