envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.85k stars 4.78k forks source link

Envoy crashes if ext_authz cluster is incorrect #3259

Closed yolken-stripe closed 6 years ago

yolken-stripe commented 6 years ago

Title: Envoy crashes if ext_authz cluster is incorrect

Description:

What issue is being seen? Describe what should be happening instead of the bug, for example: Envoy should not crash, the expected value isn't returned, etc.

If an ext_authz http filter is configured with a non-existent cluster, then Envoy crashes when the first request comes in. Ideally, it wouldn't crash and just follow the behavior encoded in the failure_mode_allow setting.

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

  1. Build envoy binary from HEAD of master: bazel build -c dbg //source/exe:envoy-static
  2. Run bazel-bin/source/exe/envoy-static -c config.yaml (see config contents below)
  3. Hit the listener: curl localhost:10001

Admin and Stats Output:

Include the admin output for the following endpoints: /stats, /clusters, /routes, /server_info. For more information, refer to the admin endpoint documentation.

clusters:

version_info::static
service1::default_priority::max_connections::1024
service1::default_priority::max_pending_requests::1024
service1::default_priority::max_requests::1024
service1::default_priority::max_retries::3
service1::high_priority::max_connections::1024
service1::high_priority::max_pending_requests::1024
service1::high_priority::max_requests::1024
service1::high_priority::max_retries::3
service1::added_via_api::false
service1::127.0.0.1:10002::cx_active::0
service1::127.0.0.1:10002::cx_connect_fail::0
service1::127.0.0.1:10002::cx_total::0
service1::127.0.0.1:10002::rq_active::0
service1::127.0.0.1:10002::rq_error::0
service1::127.0.0.1:10002::rq_success::0
service1::127.0.0.1:10002::rq_timeout::0
service1::127.0.0.1:10002::rq_total::0
service1::127.0.0.1:10002::health_flags::healthy
service1::127.0.0.1:10002::weight::1
service1::127.0.0.1:10002::region::
service1::127.0.0.1:10002::zone::
service1::127.0.0.1:10002::sub_zone::
service1::127.0.0.1:10002::canary::false
service1::127.0.0.1:10002::success_rate::-1
ext-authz::default_priority::max_connections::1024
ext-authz::default_priority::max_pending_requests::1024
ext-authz::default_priority::max_requests::1024
ext-authz::default_priority::max_retries::3
ext-authz::high_priority::max_connections::1024
ext-authz::high_priority::max_pending_requests::1024
ext-authz::high_priority::max_requests::1024
ext-authz::high_priority::max_retries::3
ext-authz::added_via_api::false
ext-authz::127.0.0.1:10003::cx_active::0
ext-authz::127.0.0.1:10003::cx_connect_fail::0
ext-authz::127.0.0.1:10003::cx_total::0
ext-authz::127.0.0.1:10003::rq_active::0
ext-authz::127.0.0.1:10003::rq_error::0
ext-authz::127.0.0.1:10003::rq_success::0
ext-authz::127.0.0.1:10003::rq_timeout::0
ext-authz::127.0.0.1:10003::rq_total::0
ext-authz::127.0.0.1:10003::health_flags::healthy
ext-authz::127.0.0.1:10003::weight::1
ext-authz::127.0.0.1:10003::region::
ext-authz::127.0.0.1:10003::zone::
ext-authz::127.0.0.1:10003::sub_zone::
ext-authz::127.0.0.1:10003::canary::false
ext-authz::127.0.0.1:10003::success_rate::-1

server_info:

envoy 13de384ab34428af99c53201f6b3c95991b7ae10/1.7.0-dev/Clean/DEBUG live 67 67 0

Config:

Include the config used to configure Envoy.

static_resources:
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 10001
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          codec_type: auto
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: service1
          forward_client_cert_details: SANITIZE_SET
          set_current_client_cert_details:
            subject: true
          http_filters:
          - config:
              grpc_service:
                envoy_grpc:
                  cluster_name: ext-authzbad
            name: envoy.ext_authz
          - config: {}
            name: envoy.router
  clusters:
  - name: service1
    connect_timeout: 0.25s
    type: static
    hosts:
    - socket_address:
        address: 127.0.0.1
        port_value: 10002
  - name: ext-authz
    connect_timeout: 0.25s
    type: static
    http2_protocol_options: {}
    hosts:
    - socket_address:
        address: 127.0.0.1
        port_value: 10003
admin:
  access_log_path: "/dev/null"
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 8001

Logs:

Include the access logs and the Envoy logs.

st-yolken1:envoy yolken$ bazel-bin/source/exe/envoy-static -c envoy-static.yaml 
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:188] initializing epoch 0 (hot restart version=disabled)
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:190] statically linked extensions:
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:192]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:195]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:198]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:201]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:203]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.statsd
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:205]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:208]   transport_sockets.downstream: raw_buffer,ssl
[2018-04-30 18:15:19.497][8209936][info][main] source/server/server.cc:211]   transport_sockets.upstream: raw_buffer,ssl
[2018-04-30 18:15:19.514][8209936][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: all clusters initialized
[2018-04-30 18:15:19.514][8209936][info][config] source/server/configuration_impl.cc:52] loading 1 listener(s)
[2018-04-30 18:15:19.523][8209936][info][config] source/server/configuration_impl.cc:92] loading tracing configuration
[2018-04-30 18:15:19.523][8209936][info][config] source/server/configuration_impl.cc:114] loading stats sink configuration
[2018-04-30 18:15:19.523][8209936][info][main] source/server/server.cc:371] all clusters initialized. initializing init manager
[2018-04-30 18:15:19.523][8209936][info][config] source/server/listener_manager_impl.cc:608] all dependencies initialized. starting workers
[2018-04-30 18:15:19.523][8209936][info][main] source/server/server.cc:387] starting main dispatch loop
[2018-04-30 18:15:24.367][8209954][critical][assert] source/common/router/router.cc:104] assert failure: !upstream_request_
[2018-04-30 18:15:24.367][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] Caught Abort trap: 6, suspect faulting address 0x7fff65966e3e
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:87] Backtrace obj<envoy-static> thr<123145343406080>:
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:99] thr<123145343406080> obj<envoy-static                        0x00000001071b98d6 _ZN8backward7details6unwindINS_14StackTraceImplINS_10system_tag10darwin_tagEE8callbackEEEmT_m>
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145343406080> #0 0x1071b98d6: 
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:99] thr<123145343406080> obj<envoy-static>
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145343406080> #1 0x1071b9465: backward::StackTraceImpl<backward::system_tag::darwin_tag>::load_here(unsigned long) + 101
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145343406080> #2 0x1071b9261: backward::StackTraceImpl<backward::system_tag::darwin_tag>::load_from(void*, unsigned long) + 49
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145343406080> #3 0x1071b77ee: Envoy::BackwardsTrace::captureFrom(void*) + 46
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:105] thr<123145343406080> #4 0x1071b76af: Envoy::SignalAction::sigHandler(int, __siginfo*, void*) + 143
[2018-04-30 18:15:24.377][8209954][critical][backtrace] bazel-out/darwin-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:110] end backtrace thread 123145343406080
Abort trap: 6

Call Stack:

If the Envoy binary is crashing, a call stack is required. Please refer to the Bazel Stack trace documentation.

See above.

mattklein123 commented 6 years ago

@gsagula @saumoh

saumoh commented 6 years ago

i can take a look at this.

gsagula commented 6 years ago

@saumoh Let me know what did you find. I'm hands on the filter right now. We could try to squeeze the fix in.

saumoh commented 6 years ago
$ bazel-bin/source/exe/envoy-static -l debug -c config.yaml 2>&1 | tools/stack_decode.py 
[2018-05-04 11:44:09.818][12278][info][main] source/server/server.cc:203] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-05-04 11:44:09.818][12278][info][main] source/server/server.cc:205] statically linked extensions:
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:207]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:210]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:213]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:216]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:218]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.statsd
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:220]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:223]   transport_sockets.downstream: raw_buffer,ssl
[2018-05-04 11:44:09.819][12278][info][main] source/server/server.cc:226]   transport_sockets.upstream: raw_buffer,ssl
[2018-05-04 11:44:09.845][12278][debug][main] source/server/server.cc:254] admin address: 0.0.0.0:8001
[2018-05-04 11:44:09.855][12282][debug][grpc] source/common/grpc/google_async_client_impl.cc:37] completionThread running
[2018-05-04 11:44:09.865][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster ext-authz
[2018-05-04 11:44:09.866][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster service1
[2018-05-04 11:44:09.866][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster ext-authz
[2018-05-04 11:44:09.867][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:87] cm init: init complete: cluster=ext-authz primary=0 secondary=0
[2018-05-04 11:44:09.867][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:59] cm init: adding: cluster=ext-authz primary=0 secondary=0
[2018-05-04 11:44:09.867][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster service1
[2018-05-04 11:44:09.867][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:87] cm init: init complete: cluster=service1 primary=0 secondary=0
[2018-05-04 11:44:09.868][12278][debug][upstream] source/common/upstream/cluster_manager_impl.cc:59] cm init: adding: cluster=service1 primary=0 secondary=0
[2018-05-04 11:44:09.868][12278][info][upstream] source/common/upstream/cluster_manager_impl.cc:131] cm init: all clusters initialized
[2018-05-04 11:44:09.868][12278][info][config] source/server/configuration_impl.cc:53] loading 1 listener(s)
[2018-05-04 11:44:09.868][12278][debug][config] source/server/configuration_impl.cc:55] listener #0:
[2018-05-04 11:44:09.868][12278][debug][config] source/server/listener_manager_impl.cc:337] begin add/update listener: name=27b67fad-c102-4dfe-b7a5-45943c977f82 hash=6901193575677395340
[2018-05-04 11:44:09.868][12278][debug][config] source/server/listener_manager_impl.cc:33]   filter #0:
[2018-05-04 11:44:09.868][12278][debug][config] source/server/listener_manager_impl.cc:34]     name: envoy.http_connection_manager
[2018-05-04 11:44:09.869][12278][debug][config] source/server/listener_manager_impl.cc:37]   config: {"http_filters":[{"config":{"grpc_service":{"envoy_grpc":{"cluster_name":"ext-authzbad"}}},"name":"envoy.ext_authz"},{"config":{},"name":"envoy.router"}],"forward_client_cert_details":"SANITIZE_SET","route_config":{"virtual_hosts":[{"domains":["*"],"routes":[{"match":{"prefix":"/"},"route":{"cluster":"service1"}}],"name":"backend"}],"name":"local_route"},"codec_type":"auto","stat_prefix":"ingress_http","set_current_client_cert_details":{"subject":true}}
[2018-05-04 11:44:09.879][12278][debug][config] source/extensions/filters/network/http_connection_manager/config.cc:241]     filter #0
[2018-05-04 11:44:09.879][12278][debug][config] source/extensions/filters/network/http_connection_manager/config.cc:242]       name: envoy.ext_authz
[2018-05-04 11:44:09.879][12278][debug][config] source/extensions/filters/network/http_connection_manager/config.cc:246]     config: {"grpc_service":{"envoy_grpc":{"cluster_name":"ext-authzbad"}}}
[2018-05-04 11:44:09.880][12278][debug][config] source/extensions/filters/network/http_connection_manager/config.cc:241]     filter #1
[2018-05-04 11:44:09.880][12278][debug][config] source/extensions/filters/network/http_connection_manager/config.cc:242]       name: envoy.router
[2018-05-04 11:44:09.880][12278][debug][config] source/extensions/filters/network/http_connection_manager/config.cc:246]     config: {}
[2018-05-04 11:44:09.880][12278][debug][config] source/server/listener_manager_impl.cc:265] add active listener: name=27b67fad-c102-4dfe-b7a5-45943c977f82, hash=6901193575677395340, address=0.0.0.0:10001
[2018-05-04 11:44:09.880][12278][info][config] source/server/configuration_impl.cc:93] loading tracing configuration
[2018-05-04 11:44:09.880][12278][info][config] source/server/configuration_impl.cc:115] loading stats sink configuration
[2018-05-04 11:44:09.880][12278][info][main] source/server/server.cc:386] all clusters initialized. initializing init manager
[2018-05-04 11:44:09.881][12278][info][config] source/server/listener_manager_impl.cc:607] all dependencies initialized. starting workers
[2018-05-04 11:44:09.881][12284][debug][main] source/server/worker_impl.cc:91] worker entering dispatch loop
[2018-05-04 11:44:09.881][12285][debug][main] source/server/worker_impl.cc:91] worker entering dispatch loop
[2018-05-04 11:44:09.881][12286][debug][main] source/server/worker_impl.cc:91] worker entering dispatch loop
[2018-05-04 11:44:09.881][12284][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster ext-authz
[2018-05-04 11:44:09.881][12278][info][main] source/server/server.cc:402] starting main dispatch loop
[2018-05-04 11:44:09.881][12285][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster ext-authz
[2018-05-04 11:44:09.881][12285][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster service1
[2018-05-04 11:44:09.881][12285][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster ext-authz
[2018-05-04 11:44:09.881][12285][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster service1
[2018-05-04 11:44:09.882][12290][debug][grpc] source/common/grpc/google_async_client_impl.cc:37] completionThread running
[2018-05-04 11:44:09.881][12288][debug][grpc] source/common/grpc/google_async_client_impl.cc:37] completionThread running
[2018-05-04 11:44:09.881][12286][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster ext-authz
[2018-05-04 11:44:09.882][12286][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster service1
[2018-05-04 11:44:09.882][12286][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster ext-authz
[2018-05-04 11:44:09.882][12286][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster service1
[2018-05-04 11:44:09.881][12287][debug][grpc] source/common/grpc/google_async_client_impl.cc:37] completionThread running
[2018-05-04 11:44:09.881][12289][debug][main] source/server/worker_impl.cc:91] worker entering dispatch loop
[2018-05-04 11:44:09.883][12289][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster ext-authz
[2018-05-04 11:44:09.883][12291][debug][grpc] source/common/grpc/google_async_client_impl.cc:37] completionThread running
[2018-05-04 11:44:09.883][12289][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster service1
[2018-05-04 11:44:09.883][12289][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster ext-authz
[2018-05-04 11:44:09.883][12289][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster service1
[2018-05-04 11:44:09.885][12284][debug][upstream] source/common/upstream/cluster_manager_impl.cc:659] adding TLS initial cluster service1
[2018-05-04 11:44:09.885][12284][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster ext-authz
[2018-05-04 11:44:09.885][12284][debug][upstream] source/common/upstream/cluster_manager_impl.cc:744] membership update for TLS cluster service1
[2018-05-04 11:44:13.684][12286][debug][main] source/server/connection_handler_impl.cc:196] [C1] new connection
[2018-05-04 11:44:13.684][12286][debug][http] source/common/http/conn_manager_impl.cc:186] [C1] new stream
[2018-05-04 11:44:13.687][12286][debug][http] source/common/http/conn_manager_impl.cc:798] [C1][S9501573907094606844] request end stream
[2018-05-04 11:44:13.687][12286][debug][http] source/common/http/conn_manager_impl.cc:452] [C1][S9501573907094606844] request headers complete (end_stream=true):
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:457] [C1][S9501573907094606844]   ':authority':'localhost:10001'
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:457] [C1][S9501573907094606844]   'user-agent':'curl/7.47.0'
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:457] [C1][S9501573907094606844]   'accept':'*/*'
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:457] [C1][S9501573907094606844]   ':path':'/'
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:457] [C1][S9501573907094606844]   ':method':'GET'
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:972] [C1][S9501573907094606844] encoding headers via codec (end_stream=true):
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:976] [C1][S9501573907094606844]   ':status':'403'
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:976] [C1][S9501573907094606844]   'date':'Fri, 04 May 2018 18:44:13 GMT'
[2018-05-04 11:44:13.688][12286][debug][http] source/common/http/conn_manager_impl.cc:976] [C1][S9501573907094606844]   'server':'envoy'
[2018-05-04 11:44:13.688][12286][debug][router] source/common/router/router.cc:250] [C1][S9501573907094606844] cluster 'service1' match for URL '/'
[2018-05-04 11:44:13.688][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   ':authority':'localhost:10001'
[2018-05-04 11:44:13.688][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   'user-agent':'curl/7.47.0'
[2018-05-04 11:44:13.688][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   'accept':'*/*'
[2018-05-04 11:44:13.688][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   ':path':'/'
[2018-05-04 11:44:13.689][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   ':method':'GET'
[2018-05-04 11:44:13.689][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   'x-forwarded-proto':'http'
[2018-05-04 11:44:13.689][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   'x-request-id':'901981a4-b459-4fdf-a1d9-58bb39270185'
[2018-05-04 11:44:13.689][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   'x-envoy-expected-rq-timeout-ms':'15000'
[2018-05-04 11:44:13.689][12286][debug][router] source/common/router/router.cc:298] [C1][S9501573907094606844]   ':scheme':'http'
[2018-05-04 11:44:13.689][12286][debug][pool] source/common/http/http1/conn_pool.cc:79] creating a new connection
[2018-05-04 11:44:13.689][12286][debug][client] source/common/http/codec_client.cc:25] [C2] connecting
[2018-05-04 11:44:13.689][12286][debug][connection] source/common/network/connection_impl.cc:565] [C2] connecting to 127.0.0.1:10002
[2018-05-04 11:44:13.689][12286][debug][connection] source/common/network/connection_impl.cc:574] [C2] connection in progress
[2018-05-04 11:44:13.689][12286][debug][pool] source/common/http/http1/conn_pool.cc:105] queueing request due to no available connections
[2018-05-04 11:44:13.689][12286][critical][assert] source/common/router/router.cc:104] assert failure: !upstream_request_
[2018-05-04 11:44:13.689][12286][critical][backtrace] bazel-out/k8-dbg/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:114] Caught Aborted, suspect faulting address 0x3e800002ff6
[2018-05-04 11:44:13.690][12286][critical] Backtrace (most recent call first) from thread 12286:
  #1 ?? ??:0
  #2 ?? ??:0
  #3 
  #4 Envoy::Router::Filter::~Filter() at router.cc:104 (discriminator 4)
  #5 Envoy::Router::ProdFilter::~ProdFilter() at router.h:377
  #6 void __gnu_cxx::new_allocator<Envoy::Router::ProdFilter>::destroy<Envoy::Router::ProdFilter>(Envoy::Router::ProdFilter*) at new_allocator.h:124
  #7 void std::allocator_traits<std::allocator<Envoy::Router::ProdFilter> >::destroy<Envoy::Router::ProdFilter>(std::allocator<Envoy::Router::ProdFilter>&, Envoy::Router::ProdFilter*) at alloc_traits.h:542
  #8 std::_Sp_counted_ptr_inplace<Envoy::Router::ProdFilter, std::allocator<Envoy::Router::ProdFilter>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() at shared_ptr_base.h:531
  #9 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() at shared_ptr_base.h:150
  #10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() at shared_ptr_base.h:659
  #11 std::__shared_ptr<Envoy::Http::StreamDecoderFilter, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() at shared_ptr_base.h:925
  #12 std::shared_ptr<Envoy::Http::StreamDecoderFilter>::~shared_ptr() at shared_ptr.h:93
  #13 Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter::~ActiveStreamDecoderFilter() at conn_manager_impl.h:133
  #14 Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter::~ActiveStreamDecoderFilter() at conn_manager_impl.h:133
  #15 std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter>::operator()(Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter*) const at unique_ptr.h:76 (discriminator 1)
  #16 std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> >::~unique_ptr() at unique_ptr.h:236
  #17 std::_List_node<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> > >::~_List_node() at stl_list.h:106
  #18 void __gnu_cxx::new_allocator<std::_List_node<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> > > >::destroy<std::_List_node<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> > > >(std::_List_node<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> > >*) at new_allocator.h:124
  #19 std::__cxx11::_List_base<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> >, std::allocator<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> > > >::_M_clear() at list.tcc:75
  #20 std::__cxx11::_List_base<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> >, std::allocator<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> > > >::~_List_base() at stl_list.h:446
  #21 std::__cxx11::list<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> >, std::allocator<std::unique_ptr<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter, std::default_delete<Envoy::Http::ConnectionManagerImpl::ActiveStreamDecoderFilter> > > >::~list() at stl_list.h:507
  #22 Envoy::Http::ConnectionManagerImpl::ActiveStream::~ActiveStream() at conn_manager_impl.cc:374
  #23 Envoy::Http::ConnectionManagerImpl::ActiveStream::~ActiveStream() at conn_manager_impl.cc:395
  #24 std::default_delete<Envoy::Event::DeferredDeletable>::operator()(Envoy::Event::DeferredDeletable*) const at unique_ptr.h:76 (discriminator 1)
  #25 std::unique_ptr<Envoy::Event::DeferredDeletable, std::default_delete<Envoy::Event::DeferredDeletable> >::reset(Envoy::Event::DeferredDeletable*) at unique_ptr.h:344
  #26 Envoy::Event::DispatcherImpl::clearDeferredDeleteList() at dispatcher_impl.cc:68 (discriminator 2)
  #27 Envoy::Event::DispatcherImpl::DispatcherImpl(std::unique_ptr<Envoy::Buffer::WatermarkFactory, std::default_delete<Envoy::Buffer::WatermarkFactory> >&&)::{lambda()#1}::operator()() const at dispatcher_impl.cc:35
  #28 std::_Function_handler<void (), Envoy::Event::DispatcherImpl::DispatcherImpl(std::unique_ptr<Envoy::Buffer::WatermarkFactory, std::default_delete<Envoy::Buffer::WatermarkFactory> >&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at functional:1871
  #29 std::function<void ()>::operator()() const at functional:2267
  #30 Envoy::Event::TimerImpl::TimerImpl(Envoy::Event::DispatcherImpl&, std::function<void ()>)::{lambda(int, short, void*)#1}::operator()(int, short, void*) const at timer_impl.cc:15
  #31 Envoy::Event::TimerImpl::TimerImpl(Envoy::Event::DispatcherImpl&, std::function<void ()>)::{lambda(int, short, void*)#1}::_FUN(int, short, void*) at timer_impl.cc:15
  #32 event_process_active_single_queue at event.c:1646
  #33 event_process_active at event.c:1738
  #34  (inlined by) event_base_loop at event.c:1961
  #35 Envoy::Event::DispatcherImpl::run(Envoy::Event::Dispatcher::RunType) at dispatcher_impl.cc:161 (discriminator 4)
  #36 Envoy::Server::WorkerImpl::threadRoutine(Envoy::Server::GuardDog&) at worker_impl.cc:94
  #37 Envoy::Server::WorkerImpl::start(Envoy::Server::GuardDog&)::{lambda()#1}::operator()() const at worker_impl.cc:67
  #38 std::_Function_handler<void (), Envoy::Server::WorkerImpl::start(Envoy::Server::GuardDog&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at functional:1871
  #39 std::function<void ()>::operator()() const at functional:2267
  #40 Envoy::Thread::Thread::Thread(std::function<void ()>)::{lambda(void*)#1}::operator()(void*) const at thread.cc:22
  #41 Envoy::Thread::Thread::Thread(std::function<void ()>)::{lambda(void*)#1}::_FUN(void*) at thread.cc:24
  #42 
  #43 ?? ??:0
  #44 
  #45 ?? ??:0
  #46 
julia-stripe commented 6 years ago

I spent a couple of hours looking into this yesterday. Here's what I found happened:

The crash happens inside this loop in conn_manager_impl.cc

the sequence of events is:

  1. Something (??) goes wrong with the ext_authz filter (presumably because of the bad configuration)
  2. All of the filters (including the router)'s onDestroy methods get called
  3. The router filter keeps getting used (decodeHeaders is called) after its onDestroy method is called. This results in the upstream_request_ variable being set, on line 312.
  4. when the router filter's destructor is called, an assertion fails because the router filter's state was changed after onDestroy was called (upstream_request_ isn't null, and it should be)

I'm having trouble telling which component is to blame here -- it seems like the ext_authz filter isn't functioning correctly (maybe it should crash Envoy earlier?) but also like a misbehaving filter earlier in the chain should not cause the router to fail like this (after onDestroy is called, presumably the router should stop processing requests?)

saumoh commented 6 years ago

@julia-stripe Thanks for taking a look. we want the ext_authz filter to be resilient to a missing cluster. In such a case the ext_authz filter should close the client connection see here and since it's before the router I would expect the router filter is not even invoked.

I think the clean up/close isn't happening correctly by ext_authz filter.

Or perhaps the router is for the bad authz_cluster?

saumoh commented 6 years ago

@julia-stripe do u think this could be the problem? onComplete is invoked on stack (not asynchronously) so state_ != State::Calling and therefore ext_authz filter returns Http::FilterHeaderStatus::Continue here Therefore the router is getting called?

PS: That seems to be the problem.

diff --git a/source/extensions/filters/http/ext_authz/ext_authz.cc b/source/extensions/filters/http/ext_authz/ext_authz.cc
index 70a1156..2d36150 100644
--- a/source/extensions/filters/http/ext_authz/ext_authz.cc
+++ b/source/extensions/filters/http/ext_authz/ext_authz.cc
@@ -53,17 +53,17 @@ void Filter::initiateCall(const Http::HeaderMap& headers) {

 Http::FilterHeadersStatus Filter::decodeHeaders(Http::HeaderMap& headers, bool) {
   initiateCall(headers);
-  return state_ == State::Calling ? Http::FilterHeadersStatus::StopIteration
+  return (state_ == State::Calling) || (state_ == State::StopDecoding) ? Http::FilterHeadersStatus::StopIteration
                                   : Http::FilterHeadersStatus::Continue;
 }

 Http::FilterDataStatus Filter::decodeData(Buffer::Instance&, bool) {
-  return state_ == State::Calling ? Http::FilterDataStatus::StopIterationAndWatermark
+  return (state_ == State::Calling) || (state_ == State::StopDecoding) ? Http::FilterDataStatus::StopIterationAndWatermark
                                   : Http::FilterDataStatus::Continue;
 }

 Http::FilterTrailersStatus Filter::decodeTrailers(Http::HeaderMap&) {
-  return state_ == State::Calling ? Http::FilterTrailersStatus::StopIteration
+  return (state_ == State::Calling) || (state_ == State::StopDecoding) ? Http::FilterTrailersStatus::StopIteration
                                   : Http::FilterTrailersStatus::Continue;
 }

@@ -116,6 +116,7 @@ void Filter::onComplete(Filters::Common::ExtAuthz::CheckStatus status) {
     callbacks_->encodeHeaders(std::move(response_headers), true);
     callbacks_->requestInfo().setResponseFlag(
         RequestInfo::ResponseFlag::UnauthorizedExternalService);
+    state_ = State::StopDecoding;
   } else {
     if (config_->failureModeAllow() && status == CheckStatus::Error) {
       // Status is Error and yet we are allowing the request. Click a counter.
diff --git a/source/extensions/filters/http/ext_authz/ext_authz.h b/source/extensions/filters/http/ext_authz/ext_authz.h
index 794a614..2b16f8a 100644
--- a/source/extensions/filters/http/ext_authz/ext_authz.h
+++ b/source/extensions/filters/http/ext_authz/ext_authz.h
@@ -80,7 +80,7 @@ public:
   void onComplete(Filters::Common::ExtAuthz::CheckStatus status) override;

 private:
-  enum class State { NotStarted, Calling, Complete };
+  enum class State { NotStarted, Calling, Complete, StopDecoding };
   void initiateCall(const Http::HeaderMap& headers);

   FilterConfigSharedPtr config_;