fabiolb / fabio

Consul Load-Balancing made simple
https://fabiolb.net
MIT License
7.27k stars 616 forks source link

http proxy error context canceled #264

Closed kostyrev closed 4 years ago

kostyrev commented 7 years ago

while benchmarking fabio I've noticed

2017/04/14 16:19:51 http: proxy error: context canceled

in logs. it is t2.micro AWS instance that was tuned with those changes:

sudo sysctl -w fs.file-max="9999999"
sudo sysctl -w fs.nr_open="9999999"
cat > /etc/security/limits.d/95-nofile.conf <<EOF
kostyrev soft nofile 102400
kostyrev hard nofile 102400
EOF
proxy.cs = cs=fast;type=file;cert=fast.com.c;key=fast.com.key
proxy.addr = :8080,:8443;cs=fast
./fabio-1.4.2-go1.8.1-linux_amd64 -cfg fabio.properties

could you please elaborate on those errors in logs? I used wrk

./wrk -t20 -c1000 -d60s --latency https://IP:8443/nginx/

behind balancer there were two t2.medium instances with nginx installed.

magiconair commented 7 years ago

How often did this happen and how many failures did you see? What was the CPU load during the test?I haven't seen this error but my guess is that you should see some aborted connections/requests on the upstream servers as well. Could you run the same test on the HTTP port as well?

kostyrev commented 7 years ago

I see those errors every time I run wrk. over https top shows 33.0% there's one error of 2017/04/14 22:08:38 http: TLS handshake error from 188.123.241.55:12061: EOF and 13-14 errors of 2017/04/14 22:09:38 http: proxy error: context canceled

over http top show 27~31% 16-22 errors of 2017/04/14 22:14:58 http: proxy error: context canceled

I see no errors in nginx error.log on upstream servers.

deuch commented 7 years ago

I've got the same error during my bench with vegeta tools ...

https://github.com/tsenart/vegeta

1500req/s , 10 workers during 3minutes. A lot of errors like this :

2017/05/23 11:49:07 http: proxy error: context canceled 2017/05/23 11:49:07 http: proxy error: context canceled 2017/05/23 11:49:07 http: proxy error: context canceled 2017/05/23 11:49:07 http: TLS handshake error from 184.13.64.47:46838: EOF 2017/05/23 11:49:07 http: TLS handshake error from 184.13.64.47:59158: EOF 2017/05/23 11:49:08 http: proxy error: context canceled 2017/05/23 11:49:08 http: proxy error: context canceled 2017/05/23 11:49:08 http: TLS handshake error from 184.13.64.47:39837: EOF 2017/05/23 11:49:09 http: TLS handshake error from 184.13.64.47:51372: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:46966: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:45049: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:36919: EOF 2017/05/23 11:49:12 http: TLS handshake error from 184.13.64.47:40404: EOF 2017/05/23 11:49:13 [INFO] consul: Health changed to #308644 2017/05/23 11:49:18 http: TLS handshake error from 184.13.64.47:53264: EOF 2017/05/23 11:49:18 http: TLS handshake error from 184.13.64.47:54748: EOF 2017/05/23 11:49:19 http: TLS handshake error from 184.13.64.47:35223: EOF 2017/05/23 11:49:19 http: TLS handshake error from 184.13.64.47:52443: EOF 2017/05/23 11:49:20 http: TLS handshake error from 184.13.64.47:36657: EOF

1 fabio, listening in HTTPS, 5 backend (just 5 nginx with a hello world) serving in HTTP (fabio soes the SSL terminaison)

See the error in vegeta :

Requests [total, rate] 270000, 1500.01 Duration [total, attack, wait] 3m23.559738851s, 2m59.999153185s, 23.560585666s Latencies [mean, 50, 95, 99, max] 9.012213368s, 4.520873ms, 38.542513598s, 45.01141293s, 1m0.000954494s Bytes In [total, mean] 105788484, 391.81 Bytes Out [total, mean] 0, 0.00 Success [ratio] 64.02% Status Codes [code:count] 200:172857 0:96521 502:622 Error Set: Get https://192.163.167.143:10443/: dial tcp 0.0.0.0:0->192.163.167.143:10443: bind: address already in use 502 Bad Gateway Get https://192.163.167.143:10443/: net/http: TLS handshake timeout Get https://192.163.167.143:10443/: net/http: timeout awaiting response headers Get https://192.163.167.143:10443/: dial tcp 0.0.0.0:0->192.163.167.143:10443: i/o timeout

Nginx use standard configuration (i can maybe tune it a little ...) but only 300req/s per NGINX so not so much.

Fabio use 102400 value for nofile and 32768 for nproc and this :

sudo sysctl -w fs.file-max="9999999" sudo sysctl -w fs.nr_open="9999999"

I do not know if the client failed (32 cores and 512GB of RAM for running the benchmark), or fabio or nginx ... nor explicit log in nginx

tino commented 6 years ago

We see the "context cancelled" at a rate of about 3% of all requests. "TLS handshake error" at about 0.3%.

CPU load is negligable (couple %), RAM usage ~50MB.

chenjpu commented 5 years ago

+1

danlsgiga commented 5 years ago

I'm also noticing lots of context cancelled messages in the error log and they are mostly correlated with requests being logged with Response 502 in the access log.

@magiconair do you know if the context cancelled error can be causing lots of 502 in our access logs and why? I've done some research and people have this issue in the Caddy proxy server as well and it seems to be related to clients closing the connection prematurely.

From my understanding, 502s should be thrown only if Fabio is unable to connect to upstreams which I believe is not the case in our environment.

Reference

https://caddy.community/t/error-context-canceled/2034 https://github.com/mholt/caddy/issues/1345 https://github.com/mholt/caddy/issues/1925

scalp42 commented 5 years ago

@danlsgiga I'm not sure if related but in our scenario Fabio can connect to the upstreams fine, see #673

danlsgiga commented 5 years ago

@scalp42 in our scenario too, its just misleading logs!