envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.02k stars 4.82k forks source link

upstream connect error or disconnect/reset before headers. reset reason: connection termination #23883

Closed alphamarket closed 1 year ago

alphamarket commented 2 years ago

We have deployed the envoy proxy in k8s. This is the output of the pods

$ kubectl get pods --all-namespaces  -l 'app in (app,envoy)'

ingress   envoy-86549f6fdc-7lwph   1/1     Running   0          34m
ingress   envoy-86549f6fdc-jmm9h   1/1     Running   0          34m
ingress   envoy-86549f6fdc-xn4tv   1/1     Running   0          40m
ingress   app-748c5c9bdd-9fxpf     1/1     Running   0          11m
ingress   app-748c5c9bdd-g75np     1/1     Running   0          7m40s
ingress   app-748c5c9bdd-gxm6t     1/1     Running   0          8m55s

As you can see all the envoy's replicas and the app is in Running state, but when we bench mark the app using apache AB tool:

$ ab -v4 -kn1000 -c100 http://example.com/page/eaa8d66b-b169-42cd-98c2-6c654f1d9175

We observe some 503 error responses from the envoy:

LOG: header received:
HTTP/1.0 503 Service Unavailable
content-length: 95
content-type: text/plain
date: Tue, 08 Nov 2022 09:48:48 GMT
server: envoy
connection: keep-alive

upstream connect error or disconnect/reset before headers. reset reason: connection termination
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking example.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests

Server Software:        envoy
Server Hostname:        example.com
Server Port:            80

Document Path:          /page/eaa8d66b-b169-42cd-98c2-6c654f1d9175
Document Length:        0 bytes

Concurrency Level:      100
Time taken for tests:   0.177 seconds
Complete requests:      1000
Failed requests:        37
   (Connect: 0, Receive: 0, Length: 37, Exceptions: 0)
Non-2xx responses:      1000
Keep-Alive requests:    1000
Total transferred:      405311 bytes
HTML transferred:       3515 bytes
Requests per second:    5655.79 [#/sec] (mean)
Time per request:       17.681 [ms] (mean)
Time per request:       0.177 [ms] (mean, across all concurrent requests)
Transfer rate:          2238.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   2.1      0       9
Processing:     7   13   5.7      9      64
Waiting:        6   13   5.7      9      64
Total:          7   13   6.3     15      64

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     16
  75%     16
  80%     16
  90%     23
  95%     24
  98%     31
  99%     32
 100%     64 (longest request)

Although that all pods are active and every single pod of the app can easily handle 1000 requests, why are we seeing some upstream connect error or disconnect/reset before headers. reset reason: connection termination error?

phlax commented 2 years ago

@alphamarket could you provide some more information - esp the Envoy version - it might be helpful to see the config also

did you try debug logging the Envoy proxies? Are there any clues in the app log?

alphamarket commented 2 years ago

@phlax Envoy version: envoyproxy/envoy:v1.23-latest

did you try debug logging the Envoy proxies?

How can I do that?

Are there any clues in the app log?

app is a very robust C++ engine, each app instance is capable of handling 6,000 requests per second, app is not a bottleneck here...

phlax commented 2 years ago

How can I do that?

https://www.envoyproxy.io/docs/envoy/latest/start/quick-start/run-envoy#debugging-envoy

app is a very robust C++ engine, each app instance is capable of handling 6,000 requests per second, app is not a bottleneck here...

no but it might give some indication as to why the connection is being terminated

alphamarket commented 2 years ago

@phlax Enovy's config:

node:
  cluster: envoy-cluster
  id: 3921a62b-e522-42d7-88d7-1cbbcbfacbd2

dynamic_resources:
  lds_config:
    path: /etc/envoy/envoy-lds.yaml

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901

static_resources:
  ################################################################################
  # Clusters
  ################################################################################
  clusters:

  # Cluster: app
  - name: app_cluster
    type: STRICT_DNS
    dns_lookup_family: V4_ONLY
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: app_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: app
                port_value: 80
# /etc/envoy/envoy-lds.yaml
resources:
################################################################################
# HTTP listeners
################################################################################
- "@type": type.googleapis.com/envoy.config.listener.v3.Listener
  name: http_listener
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 80
  filter_chains:
  - filters:
    - name: envoy.filters.network.http_connection_manager
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
        http_protocol_options:
          accept_http_10: true
        stat_prefix: ingress_http
        use_remote_address: true
        xff_num_trusted_hops: 0
        access_log:
        - name: envoy.access_loggers.stdout
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
        http_filters:
        - name: envoy.filters.http.router
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
        route_config:
          name: services_route
          virtual_hosts:

          # example.com
          - name: example.com
            domains: ["example.com"]
            retry_policy:
              retry_on: 5xx,reset,connect-failure,refused-stream
              num_retries: 10
              per_try_timeout: 10s
            routes:
            - match: { prefix: "/" }
              route: { cluster: app_cluster }
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.