Experiencing upstream connection failures since v1.21.1

cboitel commented 2 years ago

Title: upstream connection failure since upgrade to v1.21.1

Description:

Was using v1.20.0 to connect to a series of upstream services using TLS as part of our non-regression tests. Upgrading to v1.21.1 lead to failing tests due to intermittent errors (not the same service...) all of which had in common

503 error reported by envoy
access log reports response code details "upstream_reset_before_response_started{connection_failure}" and response_flags UF

We increased our default connection timeout to local services (inside a docker-compose) from 0.1s to .25s and failures no longer are reported.

Repro steps: We use:

a first docker-compose to start start envoy and 12 upstream echo services (golang implemented) with TLS support tuned to test TLS v1.2/1.3 combinations
a second docker-compose starts a series of curl commands to test the various combinations
some of the curl expected 200 status being reported and finally get 503 instead
Use envoy v1.20.0, no error reported

Admin and Stats Output: N/A

Config: N/A Logs:

I have a tcpdump of TLS nego for both versions of envoy and it is clearly showing extra latency is added at TLS negotiation time:

before (v1.20.0), client hello is sent right away after TCP connection is established (same ms) and total time to open the TLS tunnel is 1 to 2ms
after (v1.21.1), client hello may take more than 50ms to be sent and total time to establish TLS tunnel is 75ms

# v1.20.0
No.     Time                          Source                Destination           Protocol Length Info
     15 2022-03-11 09:49:32,213970    172.18.0.5            172.18.0.3            TCP      74     60858 �~F~R 8000 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3972714836 TSecr=0 WS=128
     16 2022-03-11 09:49:32,214015    172.18.0.3            172.18.0.5            TCP      74     8000 �~F~R 60858 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM=1 TSval=1258288815 TSecr=3972714836 WS=128
     17 2022-03-11 09:49:32,214026    172.18.0.5            172.18.0.3            TCP      66     60858 �~F~R 8000 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3972714836 TSecr=1258288815
     18 2022-03-11 09:49:32,214154    172.18.0.5            172.18.0.3            TLSv1.3  583    Client Hello
     19 2022-03-11 09:49:32,214166    172.18.0.3            172.18.0.5            TCP      66     8000 �~F~R 60858 [ACK] Seq=1 Ack=518 Win=64768 Len=0 TSval=1258288816 TSecr=3972714837
     20 2022-03-11 09:49:32,214436    172.18.0.3            172.18.0.5            TLSv1.3  159    Hello Retry Request
     21 2022-03-11 09:49:32,214449    172.18.0.5            172.18.0.3            TCP      66     60858 �~F~R 8000 [ACK] Seq=518 Ack=94 Win=64256 Len=0 TSval=3972714837 TSecr=1258288816
     22 2022-03-11 09:49:32,214481    172.18.0.3            172.18.0.5            TLSv1.3  72     Change Cipher Spec
     23 2022-03-11 09:49:32,214486    172.18.0.5            172.18.0.3            TCP      66     60858 �~F~R 8000 [ACK] Seq=518 Ack=100 Win=64256 Len=0 TSval=3972714837 TSecr=1258288816
     24 2022-03-11 09:49:32,214597    172.18.0.5            172.18.0.3            TLSv1.3  589    Change Cipher Spec, Client Hello
     25 2022-03-11 09:49:32,214616    172.18.0.3            172.18.0.5            TCP      66     8000 �~F~R 60858 [ACK] Seq=100 Ack=1041 Win=64256 Len=0 TSval=1258288816 TSecr=3972714837
     26 2022-03-11 09:49:32,215107    172.18.0.3            172.18.0.5            TLSv1.3  485    Server Hello, Application Data, Application Data, Application Data
     27 2022-03-11 09:49:32,215115    172.18.0.5            172.18.0.3            TCP      66     60858 �~F~R 8000 [ACK] Seq=1041 Ack=519 Win=64128 Len=0 TSval=3972714837 TSecr=1258288816
     28 2022-03-11 09:49:32,215333    172.18.0.5            172.18.0.3            TLSv1.3  124    Application Data
     29 2022-03-11 09:49:32,215351    172.18.0.3            172.18.0.5            TCP      66     8000 �~F~R 60858 [ACK] Seq=519 Ack=1099 Win=64256 Len=0 TSval=1258288817 TSecr=3972714838
     30 2022-03-11 09:49:32,215486    172.18.0.5            172.18.0.3            TLSv1.3  369    Application Data
     31 2022-03-11 09:49:32,215499    172.18.0.3            172.18.0.5            TCP      66     8000 �~F~R 60858 [ACK] Seq=519 Ack=1402 Win=64128 Len=0 TSval=1258288817 TSecr=3972714838
     32 2022-03-11 09:49:32,215715    172.18.0.3            172.18.0.5            TLSv1.3  860    Application Data
     33 2022-03-11 09:49:32,215724    172.18.0.5            172.18.0.3            TCP      66     60858 �~F~R 8000 [ACK] Seq=1402 Ack=1313 Win=64128 Len=0 TSval=3972714838 TSecr=1258288817

# v1.21.1
No.     Time                          Source                Destination           Protocol Length Info
     25 2022-03-11 09:58:37,769340    172.19.0.5            172.19.0.3            TCP      74     56182 �~F~R 8000 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3597694218 TSecr=0 WS=128
     26 2022-03-11 09:58:37,769369    172.19.0.3            172.19.0.5            TCP      74     8000 �~F~R 56182 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM=1 TSval=4075772660 TSecr=3597694218 WS=128
     27 2022-03-11 09:58:37,769378    172.19.0.5            172.19.0.3            TCP      66     56182 �~F~R 8000 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3597694218 TSecr=4075772660
     30 2022-03-11 09:58:37,788284    172.19.0.5            172.19.0.3            TLSv1.3  258    Client Hello
     31 2022-03-11 09:58:37,788332    172.19.0.3            172.19.0.5            TCP      66     8000 �~F~R 56182 [ACK] Seq=1 Ack=193 Win=65024 Len=0 TSval=4075772679 TSecr=3597694237
     32 2022-03-11 09:58:37,796129    172.19.0.3            172.19.0.5            TLSv1.3  159    Hello Retry Request
     33 2022-03-11 09:58:37,796183    172.19.0.5            172.19.0.3            TCP      66     56182 �~F~R 8000 [ACK] Seq=193 Ack=94 Win=64256 Len=0 TSval=3597694245 TSecr=4075772686
     34 2022-03-11 09:58:37,796492    172.19.0.5            172.19.0.3            TLSv1.3  297    Change Cipher Spec, Client Hello
     35 2022-03-11 09:58:37,796551    172.19.0.3            172.19.0.5            TCP      66     8000 �~F~R 56182 [ACK] Seq=94 Ack=424 Win=64896 Len=0 TSval=4075772687 TSecr=3597694245
     36 2022-03-11 09:58:37,796574    172.19.0.3            172.19.0.5            TLSv1.3  72     Change Cipher Spec
     37 2022-03-11 09:58:37,796578    172.19.0.5            172.19.0.3            TCP      66     56182 �~F~R 8000 [ACK] Seq=424 Ack=100 Win=64256 Len=0 TSval=3597694245 TSecr=4075772687
     48 2022-03-11 09:58:37,814567    172.19.0.3            172.19.0.5            TLSv1.3  2561   Server Hello, Application Data, Application Data, Application Data, Application Data, Application Data
     49 2022-03-11 09:58:37,814594    172.19.0.5            172.19.0.3            TCP      66     56182 �~F~R 8000 [ACK] Seq=424 Ack=2595 Win=63744 Len=0 TSval=3597694263 TSecr=4075772705
     50 2022-03-11 09:58:37,815013    172.19.0.5            172.19.0.3            TLSv1.3  90     Application Data
     51 2022-03-11 09:58:37,815049    172.19.0.3            172.19.0.5            TCP      66     8000 �~F~R 56182 [ACK] Seq=2595

access_log:

{"duration":475,"response_flags":"UF","bytes_received":0,"remote_addr":"172.19.0.9","method":"GET","path":"/service03","user_agent":"curl/7.68.0","upstream":{"duration":null,"service":"service03","host":"172.19.0.2:8000"},"protocol":"HTTP/1.1","id":"bdc55196-1c65-4054-bc5c-f7450a781b26","response_code_details":"upstream_reset_before_response_started{connection_failure}","bytes_sent":91,"response_code":503,"source":"envoy","listener":"service","start_time":"2022-03-11T09:58:37.960+0000"}

error_log:

{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"initializing epoch 0 (base id=0, hot restart version=11.104)"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"statically linked extensions:"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.resolvers: envoy.ip"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.filters.listener: envoy.filters.listener.http_inspector, envoy.filters.listener.original_dst, envoy.filters.listener.original_src, envoy.filters.listener.proxy_protocol, envoy.filters.listener.tls_inspector, envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.health_checkers: envoy.health_checkers.redis"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.access_logger.extension_filters: envoy.access_loggers.extension_filters.cel"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.dubbo_proxy.protocols: dubbo"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.wasm.runtime: envoy.wasm.runtime.null, envoy.wasm.runtime.v8"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.common.key_value: envoy.key_value.file_based"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.rbac.matchers: envoy.rbac.matchers.upstream_ip_port"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.dynamic_ot, envoy.tracers.lightstep, envoy.tracers.opencensus, envoy.tracers.skywalking, envoy.tracers.xray, envoy.tracers.zipkin, envoy.zipkin"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.http.stateful_header_formatters: preserve_case"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.quic.server.crypto_stream: envoy.quic.crypto_stream.server.quiche"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.matching.input_matchers: envoy.matching.matchers.consistent_hashing, envoy.matching.matchers.ip"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.retry_priorities: envoy.retry_priorities.previous_priorities"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.http.stateful_session: envoy.http.stateful_session.cookie"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.request_id: envoy.request_id.uuid"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.filters.http: envoy.bandwidth_limit, envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.ext_proc, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.admission_control, envoy.filters.http.alternate_protocols_cache, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.bandwidth_limit, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cdn_loop, envoy.filters.http.composite, envoy.filters.http.compressor, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.decompressor, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.dynamo, envoy.filters.http.ext_authz, envoy.filters.http.ext_proc, envoy.filters.http.fault, envoy.filters.http.grpc_http1_bridge, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_json_transcoder, envoy.filters.http.grpc_stats, envoy.filters.http.grpc_web, envoy.filters.http.header_to_metadata, envoy.filters.http.health_check, envoy.filters.http.ip_tagging, envoy.filters.http.jwt_authn, envoy.filters.http.local_ratelimit, envoy.filters.http.lua, envoy.filters.http.oauth2, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.ratelimit, envoy.filters.http.rbac, envoy.filters.http.router, envoy.filters.http.set_metadata, envoy.filters.http.stateful_session, envoy.filters.http.tap, envoy.filters.http.wasm, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.local_rate_limit, envoy.lua, envoy.rate_limit, envoy.router, match-wrapper"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.omit_host_metadata, envoy.retry_host_predicates.previous_hosts"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.tls.cert_validator: envoy.tls.cert_validator.default, envoy.tls.cert_validator.spiffe"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.internal_redirect_predicates: envoy.internal_redirect_predicates.allow_listed_routes, envoy.internal_redirect_predicates.previous_routes, envoy.internal_redirect_predicates.safe_cross_scheme"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.client_ssl_auth, envoy.filters.network.connection_limit, envoy.filters.network.direct_response, envoy.filters.network.dubbo_proxy, envoy.filters.network.echo, envoy.filters.network.ext_authz, envoy.filters.network.http_connection_manager, envoy.filters.network.local_ratelimit, envoy.filters.network.mongo_proxy, envoy.filters.network.ratelimit, envoy.filters.network.rbac, envoy.filters.network.redis_proxy, envoy.filters.network.sni_cluster, envoy.filters.network.sni_dynamic_forward_proxy, envoy.filters.network.tcp_proxy, envoy.filters.network.thrift_proxy, envoy.filters.network.wasm, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.matching.action: composite-action, skip"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.matching.http.input: request-headers, request-trailers, response-headers, response-trailers"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.rate_limit_descriptors: envoy.rate_limit_descriptors.expr"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.upstreams: envoy.filters.connection_pools.tcp.generic"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.quic.proof_source: envoy.quic.proof_source.filter_chain"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.bootstrap: envoy.bootstrap.wasm, envoy.extensions.network.socket_interface.default_socket_interface"}
{"source":"envoy","date":"2022-03-11T09:56:49.261+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.stats_sinks: envoy.dog_statsd, envoy.graphite_statsd, envoy.metrics_service, envoy.stat_sinks.dog_statsd, envoy.stat_sinks.graphite_statsd, envoy.stat_sinks.hystrix, envoy.stat_sinks.metrics_service, envoy.stat_sinks.statsd, envoy.stat_sinks.wasm, envoy.statsd"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.access_loggers: envoy.access_loggers.file, envoy.access_loggers.http_grpc, envoy.access_loggers.open_telemetry, envoy.access_loggers.stderr, envoy.access_loggers.stdout, envoy.access_loggers.tcp_grpc, envoy.access_loggers.wasm, envoy.file_access_log, envoy.http_grpc_access_log, envoy.open_telemetry_access_log, envoy.stderr_access_log, envoy.stdout_access_log, envoy.tcp_grpc_access_log, envoy.wasm_access_log"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.dubbo_proxy.route_matchers: default"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.thrift_proxy.transports: auto, framed, header, unframed"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.compression.decompressor: envoy.compression.brotli.decompressor, envoy.compression.gzip.decompressor"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.upstream_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions, envoy.upstreams.http.http_protocol_options"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.matching.common_inputs: envoy.matching.common_inputs.environment_variable"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.quic, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.starttls, envoy.transport_sockets.tap, envoy.transport_sockets.tcp_stats, envoy.transport_sockets.tls, raw_buffer, starttls, tls"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.http.cache: envoy.extensions.http.cache.simple"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.http.original_ip_detection: envoy.http.original_ip_detection.custom_header, envoy.http.original_ip_detection.xff"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.network.dns_resolver: envoy.network.dns_resolver.cares"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.dubbo_proxy.serializers: dubbo.hessian2"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.dubbo_proxy.filters: envoy.filters.dubbo.router"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.compression.compressor: envoy.compression.brotli.compressor, envoy.compression.gzip.compressor"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.formatter: envoy.formatter.metadata, envoy.formatter.req_without_query"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.thrift_proxy.filters: envoy.filters.thrift.header_to_metadata, envoy.filters.thrift.rate_limit, envoy.filters.thrift.router"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.filters.udp_listener: envoy.filters.udp.dns_filter, envoy.filters.udp_listener.udp_proxy"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.quic, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.starttls, envoy.transport_sockets.tap, envoy.transport_sockets.tcp_stats, envoy.transport_sockets.tls, envoy.transport_sockets.upstream_proxy_protocol, raw_buffer, starttls, tls"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.guarddog_actions: envoy.watchdog.abort_action, envoy.watchdog.profile_action"}
{"source":"envoy","date":"2022-03-11T09:56:49.262+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource"}
{"source":"envoy","date":"2022-03-11T09:56:49.271+00:00","level":"info","component":"main","process":1,"thread":1,"message":"HTTP header map info:"}
{"source":"envoy","date":"2022-03-11T09:56:49.272+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  request header map: 640 bytes: :authority,:method,:path,:protocol,:scheme,accept,accept-encoding,access-control-request-method,authentication,authorization,cache-control,cdn-loop,connection,content-encoding,content-length,content-type,expect,grpc-accept-encoding,grpc-timeout,if-match,if-modified-since,if-none-match,if-range,if-unmodified-since,keep-alive,origin,pragma,proxy-connection,referer,te,transfer-encoding,upgrade,user-agent,via,x-client-trace-id,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-downstream-service-cluster,x-envoy-downstream-service-node,x-envoy-expected-rq-timeout-ms,x-envoy-external-address,x-envoy-force-trace,x-envoy-hedge-on-per-try-timeout,x-envoy-internal,x-envoy-ip-tags,x-envoy-max-retries,x-envoy-original-path,x-envoy-original-url,x-envoy-retriable-header-names,x-envoy-retriable-status-codes,x-envoy-retry-grpc-on,x-envoy-retry-on,x-envoy-upstream-alt-stat-name,x-envoy-upstream-rq-per-try-timeout-ms,x-envoy-upstream-rq-timeout-alt-response,x-envoy-upstream-rq-timeout-ms,x-envoy-upstream-stream-duration-ms,x-forwarded-client-cert,x-forwarded-for,x-forwarded-host,x-forwarded-proto,x-ot-span-context,x-request-id"}
{"source":"envoy","date":"2022-03-11T09:56:49.272+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  request trailer map: 128 bytes: "}
{"source":"envoy","date":"2022-03-11T09:56:49.272+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  response header map: 424 bytes: :status,access-control-allow-credentials,access-control-allow-headers,access-control-allow-methods,access-control-allow-origin,access-control-expose-headers,access-control-max-age,age,cache-control,connection,content-encoding,content-length,content-type,date,etag,expires,grpc-message,grpc-status,keep-alive,last-modified,location,proxy-connection,server,transfer-encoding,upgrade,vary,via,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-degraded,x-envoy-immediate-health-check-fail,x-envoy-ratelimited,x-envoy-upstream-canary,x-envoy-upstream-healthchecked-cluster,x-envoy-upstream-service-time,x-request-id"}
{"source":"envoy","date":"2022-03-11T09:56:49.272+00:00","level":"info","component":"main","process":1,"thread":1,"message":"  response trailer map: 152 bytes: grpc-message,grpc-status"}
{"source":"envoy","date":"2022-03-11T09:56:49.274+00:00","level":"info","component":"main","process":1,"thread":1,"message":"runtime: layers:\n  - name: static_layer_0\n    static_layer:\n      envoy:\n        resource_limits:\n          listener:\n            service:\n              connection_limit: 1024\n      overload:\n        global_downstream_max_connections: 1034"}
{"source":"envoy","date":"2022-03-11T09:56:49.274+00:00","level":"info","component":"admin","process":1,"thread":1,"message":"admin address: 0.0.0.0:8484"}
{"source":"envoy","date":"2022-03-11T09:56:49.274+00:00","level":"info","component":"config","process":1,"thread":1,"message":"loading tracing configuration"}
{"source":"envoy","date":"2022-03-11T09:56:49.274+00:00","level":"info","component":"config","process":1,"thread":1,"message":"loading 0 static secret(s)"}
{"source":"envoy","date":"2022-03-11T09:56:49.274+00:00","level":"info","component":"config","process":1,"thread":1,"message":"loading 12 cluster(s)"}
{"source":"envoy","date":"2022-03-11T09:56:49.284+00:00","level":"info","component":"config","process":1,"thread":1,"message":"loading 1 listener(s)"}
{"source":"envoy","date":"2022-03-11T09:56:49.287+00:00","level":"info","component":"config","process":1,"thread":1,"message":"loading stats configuration"}
{"source":"envoy","date":"2022-03-11T09:56:49.288+00:00","level":"info","component":"main","process":1,"thread":1,"message":"starting main dispatch loop"}
{"source":"envoy","date":"2022-03-11T09:56:49.289+00:00","level":"info","component":"runtime","process":1,"thread":1,"message":"RTDS has finished initialization"}
{"source":"envoy","date":"2022-03-11T09:56:49.289+00:00","level":"info","component":"upstream","process":1,"thread":1,"message":"cm init: all clusters initialized"}
{"source":"envoy","date":"2022-03-11T09:56:49.289+00:00","level":"info","component":"main","process":1,"thread":1,"message":"all clusters initialized. initializing init manager"}
{"source":"envoy","date":"2022-03-11T09:56:49.289+00:00","level":"info","component":"config","process":1,"thread":1,"message":"all dependencies initialized. starting workers"}

Call Stack: N/A

ggreenway commented 2 years ago

after (v1.21.1), client hello may take more than 50ms to be sent and total time to establish TLS tunnel is 75ms

As you've pointed out, it seems this is the change that's resulting in your timeouts.

Does this delay happen for every connection, or only sometimes.

Can you enable trace-level logging and post the logs for one connection?

cboitel commented 2 years ago

Envoy error log at trace level and tcpdump extract in text mode:

full envoy error.log
full tcpdump as text saved.txt

In these files:

in error.log, C4 matches first HTTP request received by envoy and C5 matches connection to TLS enabled service to process this connection
in saved.txt:
- packet 1 to 5: first request is received by envoy
- packet 6 to 8: a new TCP connection is established with cluster matching lines
- packet 17 and 19 is when envoy sends client hello 90ms later
since we have a 100ms timeout on connection (local network inside docker), it ends up in envoy returning a 503 and closing connection with cluster host

I have isolated inside the tcpdump packets related to that first connection:

These do match C4/C5 in error log

ggreenway commented 2 years ago

It's possible this is the same as #19717. The fix for that was just merged. Can you test the latest from main to see if that fixes it?

cboitel commented 2 years ago

I succesfully recompiled envoy: took some time and ressources...but was worth it.

Using the compiled binary, i can see our tests are now passing with connection_timeout set back to 0.1s. Looking$ at the tcpdump/... extracts, i can see now client hello triggering immediately as it was previously.

Will it be backported to v1.21.2 or will wait for upcoming v1.22.0 or both ? I believe this is a major impact and typically prevents us from upgrading at this stage.

mattklein123 commented 2 years ago

Yeah we should backport this. Will mark it as such.

mattklein123 commented 2 years ago

Going to close this and we can track the backport status in the merged PR. Thank you!

envoyproxy / envoy

Experiencing upstream connection failures since v1.21.1 #20311