Requests co-located on worker thread 503 when upstream closes connection

jbohanon commented 1 year ago

If you are reporting any crash or any potential security issue, do not open an issue in this repo. Please report the issue via emailing envoy-security@googlegroups.com where the issue will be triaged appropriately.

Title: Requests co-located on worker thread 503 when upstream closes connection

Description:

What issue is being seen? Describe what should be happening instead of the bug, for example: Envoy should not crash, the expected value isn't returned, etc.

When load testing against a NodeJS upstream, some percentage of requests will receive 503 responses. This behavior is not exhibited when using nginx as proxy. Logs indicate that all of the requests which receive that response are being handled on the same worker thread.

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

reproduction script

```sh #!/usr/bin/bash SCRIPT_CALL_PATH="$(ps $$ | tail -1 | awk '{print $6}')" if [[ $SCRIPT_CALL_PATH =~ [^.]/get-deps-and-run-test.sh ]]; then echo "script must be run from within testing directory" exit 1 fi if [[ -z $(which curl) ]]; then echo "script requires curl command to download dependencies" exit 1 fi # Get k6 if [ ! -f "./k6" ]; then curl -LO "https://github.com/grafana/k6/releases/download/v0.44.1/k6-v0.44.1-linux-amd64.tar.gz" tar -zxf k6-v0.44.1-linux-amd64.tar.gz mv k6-v0.44.1-linux-amd64/k6 ./k6 rm -rf ./k6-v0.44.1-linux-amd64 rm ./k6-v0.44.1-linux-amd64.tar.gz fi # Get node if [ ! -f "./node" ]; then curl -LO "https://nodejs.org/dist/v18.16.0/node-v18.16.0-linux-x64.tar.xz" tar -xf node-v18.16.0-linux-x64.tar.xz ln -s "$(pwd)/node-v18.16.0-linux-x64/bin/node" ./node rm ./node-v18.16.0-linux-x64.tar.xz fi # Get envoy if [ ! -f "./envoy" ] && [ -z "${USE_NGINX}" ]; then curl -LO "https://github.com/envoyproxy/envoy/releases/download/v1.26.1/envoy-1.26.1-linux-x86_64" mv envoy-1.26.1-linux-x86_64 envoy chmod +x envoy fi # Get nginx if [ ! -d "./nginx" ] && [ -n "${USE_NGINX}" ]; then curl -LO "https://nginx.org/download/nginx-1.24.0.tar.gz" tar -zxf nginx-1.24.0.tar.gz cd nginx-1.24.0 || exit 1 mkdir -p "$(pwd)/out/logs" ./configure --prefix="$(pwd)/out" make # sudo make install cp ../nginx.conf ./out/ cd .. rm nginx-1.24.0.tar.gz fi if [ -z "${USE_NGINX}" ]; then # Run envoy ./envoy --config-path ./envoy.yaml --log-level debug 1>./envoy.log 2>&1 & PROXY_PID=$! else # Run nginx ./nginx-1.24.0/objs/nginx -c nginx.conf & fi # Run our test upstream ./node node-server.js & NODE_SERVER_PID=$! # Run our k6 test ./k6 run ./k6-proxy-node.js if [ -z "${USE_NGINX}" ]; then curl http://localhost:19000/stats -o ./envoy_metrics kill $PROXY_PID else nginx-1.24.0/objs/nginx -s stop fi kill $NODE_SERVER_PID ``` `envoy.yaml` ```yaml static_resources: listeners: - address: socket_address: address: 0.0.0.0 port_value: 19000 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager codec_type: auto stat_prefix: ingress_http route_config: name: local_route virtual_hosts: - name: backend domains: - "*" routes: - match: prefix: "/" route: cluster: service http_filters: - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router - address: socket_address: address: 0.0.0.0 port_value: 8082 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager codec_type: auto stat_prefix: ingress_http route_config: name: local_route virtual_hosts: - name: backend domains: - "*" routes: - match: prefix: "/" route: cluster: test-node-server http_filters: - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router clusters: - name: service connect_timeout: 0.25s type: strict_dns lb_policy: round_robin load_assignment: cluster_name: service endpoints: - lb_endpoints: - endpoint: address: socket_address: address: 127.0.0.1 port_value: 19001 - name: test-node-server connect_timeout: 0.25s type: strict_dns lb_policy: round_robin load_assignment: cluster_name: test-node-server endpoints: - lb_endpoints: - endpoint: address: socket_address: address: 127.0.0.1 port_value: 4000 admin: access_log_path: "/dev/null" address: socket_address: address: 0.0.0.0 port_value: 19001 ``` `node-server.js` ```javascript let http = require("http") let host = '0.0.0.0' let port = 4000 let server = http.createServer((_, res) => { res.writeHead(200) res.end() }) server.listen(port, host, () => { console.log(`Server is running on http://${host}:${port}`) }) ``` `k6-proxy-node.js` ```javascript import http from "k6/http"; import { sleep } from "k6"; import { check } from "k6"; export const options = { vus: 1000, duration: "10s", }; export default function () { const res = http.get( "http://localhost:8082/" ); sleep(5); check(res, { "is status 503": (r) => r.status === 503, }); if (res.status == 503) { console.error(res.body); } } ``` `nginx.conf` ``` # user nginx; worker_processes 3; error_log ./nginx_error.log; events { worker_connections 10240; } http { log_format main 'remote_addr:$remote_addr\t' 'time_local:$time_local\t' 'method:$request_method\t' 'uri:$request_uri\t' 'host:$host\t' 'status:$status\t' 'bytes_sent:$body_bytes_sent\t' 'referer:$http_referer\t' 'useragent:$http_user_agent\t' 'forwardedfor:$http_x_forwarded_for\t' 'request_time:$request_time'; access_log ./nginx_access.log main; server { listen 8082 default_server; error_log ./nginx_error.log; location / { proxy_pass http://127.0.0.1:4000; proxy_http_version 1.1; } } } ```

Admin and Stats Output:

Include the admin output for the following endpoints: /stats, /clusters, /routes, /server_info. For more information, refer to the admin endpoint documentation.

outputs

`stats` https://gist.github.com/jbohanon/b27599338f10b6035818a9c847be0303/raw/0ef68ef0a25e72286dbe9b6f24539838bc769ecf/stats.txt `clusters` https://gist.github.com/jbohanon/b27599338f10b6035818a9c847be0303/raw/0ef68ef0a25e72286dbe9b6f24539838bc769ecf/clusters.txt `server_info` https://gist.github.com/jbohanon/b27599338f10b6035818a9c847be0303/raw/0ef68ef0a25e72286dbe9b6f24539838bc769ecf/server_info.txt

Note: If there are privacy concerns, sanitize the data prior to sharing.

Config:

Include the config used to configure Envoy.

static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 19000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
- address:
socket_address:
address: 0.0.0.0
port_value: 8082
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: backend
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: test-node-server
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: service
connect_timeout: 0.25s
type: strict_dns
lb_policy: round_robin
load_assignment:
cluster_name: service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 19001
- name: test-node-server
connect_timeout: 0.25s
type: strict_dns
lb_policy: round_robin
load_assignment:
cluster_name: test-node-server
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 4000
admin:
access_log_path: "/dev/null"
address:
socket_address:
address: 0.0.0.0
port_value: 19001

Logs:

Include the access logs and the Envoy logs.

https://gist.github.com/jbohanon/b27599338f10b6035818a9c847be0303/raw/0ef68ef0a25e72286dbe9b6f24539838bc769ecf/envoy.log

Note: If there are privacy concerns, sanitize the data prior to sharing.

Additional Context: This is related to https://github.com/envoyproxy/envoy/issues/23625 and https://github.com/solo-io/gloo/issues/6647. This is possibly a regression of https://github.com/envoyproxy/envoy/issues/6190

kyessenov commented 1 year ago

If I read the code correctly, there's an inherent race in that upstream requests are assigned to codecs (HTTP/1.1 connections) and if the connection is closed after assignment, the request triggers downstream 503. I suppose a retry would solve this, but ultimately, the upstream requests would need to be moved between codecs and that's an involved change.

cc @alyssawilk

jbohanon commented 1 year ago

I added

retry_policy:
  retry_on: reset,5xx
  num_retries: 10

to the route config, which caused some test runs to be completely devoid of the issue, yet other test runs were unaffected. We are definitely entering onUpstreamReset where some reset events are being retried as expected, and others are simply not being retried at all. This appears to be due to retry resource limits as the stats below indicate we are not circuit-breaking, but we are hitting overflow.

cluster.test-node-server.circuit_breakers.default.cx_open: 0
cluster.test-node-server.circuit_breakers.default.cx_pool_open: 0
cluster.test-node-server.circuit_breakers.default.rq_open: 0
cluster.test-node-server.circuit_breakers.default.rq_pending_open: 0
cluster.test-node-server.circuit_breakers.default.rq_retry_open: 0
cluster.test-node-server.circuit_breakers.high.cx_open: 0
cluster.test-node-server.circuit_breakers.high.cx_pool_open: 0
cluster.test-node-server.circuit_breakers.high.rq_open: 0
cluster.test-node-server.circuit_breakers.high.rq_pending_open: 0
cluster.test-node-server.circuit_breakers.high.rq_retry_open: 0
...
cluster.test-node-server.upstream_rq_retry: 3
cluster.test-node-server.upstream_rq_retry_backoff_exponential: 3
cluster.test-node-server.upstream_rq_retry_backoff_ratelimited: 0
cluster.test-node-server.upstream_rq_retry_limit_exceeded: 0
cluster.test-node-server.upstream_rq_retry_overflow: 11
cluster.test-node-server.upstream_rq_retry_success: 3

All that to say, it seems retries are not going to be the solution since that seems to just overflow the retry resource.

jbohanon commented 1 year ago

What are some instances where we would enter onUpstreamReset with a remote reset aside from this case? It could be valid that the upstream is unhealthy and is dropping connections, in which case we wouldn't want to keep thrashing connections by retrying or moving to a new codec. Could we check if the reset was received on another stream? Are we multiplexing the HTTP/1.1 connections or are the requests in question queued on the codec?

Edit: according to these docs we explicitly expect to not have this kind of pollution in the event of an upstream reset. I think I'm now understanding the race being discussed:

Hypothetical conn pool with max 1 conn
Downstream requests A and B come in ~simultaneously
Request A is bound to the upstream connection
Stream 1 is created on the upstream connection
The upstream request is completed
Stream 1 is destroyed on the upstream connection
The connection is returned to the pool and Envoy responds to downstream request A
Request B is bound to the upstream connection
(Stream 2 is created on the upstream connection)?
FIN is received on the upstream connection
Request B has no context for this and has to assume that the upstream server is responding to it?

alyssawilk commented 1 year ago

couple of things if A and B come in simultaneously unless you have strict connection limits, you'll end up with A and B being simultaneously served on separate connections but in general yes for HTTP/1.1 we'll reuse connections so if we get new stream B right as A is finishing, B will use the same upstream HTTP/1.1 connection

If the upstream serves responseA and sends connection: close, we won't reassign to B. If upstream serves responseA and sends an immediate fin there's a chance we'll pick up the FIN before the connection is reused, but there is an inherent race condition where the TCP FIN can be delayed on the network and B be received after B has sent headers on that connection. There's not much you can do about that but 1) try to have backends which will send connection: close as you're supposed to do and 2) have retries on so B would try again.

jbohanon commented 1 year ago

@alyssawilk and @kyessenov thank you for these explanations. I see the complication in the race you described. We are okay with using retries, injecting Connection: Close headers into the upstream request, and/or (ideally) fixing the upstream server's behavior.

I do still think that this behavior could be improved if there's a reliable way to detect and identify this event at runtime, but we are no longer blocked on it.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

triplewy commented 1 year ago

It seems that configuring retries does not work for this scenario as I consistently encounter 60 second response times with upstream_reset_before_response_started{connection_termination} even though upstream per-try timeout is set to 5 seconds and upstream timeout is set to 15 seconds.

@jbohanon Did retries help solve this issue for you? It does not seem to solve this issue for me.

envoyproxy / envoy

Requests co-located on worker thread 503 when upstream closes connection #27330