envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.95k stars 4.8k forks source link

Envoy fails to bind to a port after hot restart when the port is used for both UDP and TCP #20584

Closed jparklab closed 2 years ago

jparklab commented 2 years ago

Title: Envoy fails to bind to a port after hot restart when the port is used for both UDP and TCP

Description: After a hot restart, envoy fails to finish initialization with the following error message

[2022-03-30 12:10:43.559][25804][error][config] [source/server/listener_manager_impl.cc:598] final pre-worker listener init for listener 'tcp_listener' failed: cannot listen() errno=95

It happens to listener using TCP protocol that binds to a port that is also bound to by another listener using UDP protocol.

Repro steps:

The issue can be reproduced by running envoy with a configuration that has a UDP listener and a TCP listener that bind to the same address, and hot restart it.

  1. Start envoy using hot-restarter script
    # see below for the content of start-service.sh 
    hot-restarter.py start-service.sh
  2. Hot restart envoy
    kill -HUP $(pgrep -f hot-restarter)

Here's the envoy configuration, and start-service.sh I used to reproduce the issue

envoy-config.yaml:

# envoy-config.yaml
static_resources:
  listeners:
    - name: udp_listener
      address:
        socket_address:
          protocol: UDP
          address: 127.0.0.3
          port_value: 10000
      listener_filters:
        - name: envoy.filters.udp_listener.udp_proxy
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig
            stat_prefix: service
            cluster: service_udp
    - name: tcp_listener
      address:
        socket_address:
          address: 127.0.0.3
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.tcp_proxy
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
                stat_prefix: tcp_listener
                cluster: tcp_cluster
  clusters:
    - name: udp_cluster
      connect_timeout: 0.25s
      type: STATIC
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: udp_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.3
                      port_value: 20000
    - name: tcp_cluster
      connect_timeout: 0.25s
      type: STATIC
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: tcp_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.3
                      port_value: 20000

start-service.sh:

#!/bin/bash
exec envoy-static -c ./envoy-config.yaml --restart-epoch $RESTART_EPOCH

It appears that it only happens when the UDP listener binds to the port before the TCP listener. I can always reproduce the issue when I run the envoy with UDP listener only, and hot restart it after adding TCP listener. The issue does not happen when I run envoy with TCP listener only and hot restart after adding UDP listener.

Logs:

[2022-03-30 12:32:54,883][31853][INFO][hot-restarter.py:109] got SIGHUP
[2022-03-30 12:32:54,883][31853][INFO][hot-restarter.py:183] forking and execing new child process at epoch 1
[2022-03-30 12:32:54,884][31853][INFO][hot-restarter.py:192] forked new child process with PID=32675
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:389] initializing epoch 1 (base id=0, hot restart version=11.104)
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:392] statically linked extensions:
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.wasm.runtime: envoy.wasm.runtime.null, envoy.wasm.runtime.v8
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.matching.network.custom_matchers: envoy.matching.custom_matchers.trie_matcher
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.resolvers: envoy.ip
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.matching.http.input: envoy.matching.inputs.request_headers, envoy.matching.inputs.request_trailers, envoy.matching.inputs.response_headers, envoy.matching.inputs.response_trailers
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.omit_host_metadata, envoy.retry_host_predicates.previous_hosts
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.http.cache: envoy.extensions.http.cache.simple
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.matching.action: composite-action, skip
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.dubbo_proxy.protocols: dubbo
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.filters.udp_listener: envoy.filters.udp.dns_filter, envoy.filters.udp_listener.udp_proxy
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.thrift_proxy.transports: auto, framed, header, unframed
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.rate_limit_descriptors: envoy.rate_limit_descriptors.expr
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.request_id: envoy.request_id.uuid
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.dubbo_proxy.serializers: dubbo.hessian2
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.health_checkers: envoy.health_checkers.redis
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.config.validators: envoy.config.validators.minimum_clusters, envoy.config.validators.minimum_clusters_validator
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.compression.decompressor: envoy.compression.brotli.decompressor, envoy.compression.gzip.decompressor
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.stats_sinks: envoy.dog_statsd, envoy.graphite_statsd, envoy.metrics_service, envoy.stat_sinks.dog_statsd, envoy.stat_sinks.graphite_statsd, envoy.stat_sinks.hystrix, envoy.stat_sinks.metrics_service, envoy.stat_sinks.statsd, envoy.stat_sinks.wasm, envoy.statsd
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.filters.listener: envoy.filters.listener.http_inspector, envoy.filters.listener.original_dst, envoy.filters.listener.original_src, envoy.filters.listener.proxy_protocol, envoy.filters.listener.tls_inspector, envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.quic, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.starttls, envoy.transport_sockets.tap, envoy.transport_sockets.tcp_stats, envoy.transport_sockets.tls, raw_buffer, starttls, tls
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.bootstrap: envoy.bootstrap.wasm, envoy.extensions.network.socket_interface.default_socket_interface
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.dubbo_proxy.filters: envoy.filters.dubbo.router
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.quic.proof_source: envoy.quic.proof_source.filter_chain
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.access_logger.extension_filters: envoy.access_loggers.extension_filters.cel
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.upstream_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions, envoy.upstreams.http.http_protocol_options
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.internal_redirect_predicates: envoy.internal_redirect_predicates.allow_listed_routes, envoy.internal_redirect_predicates.previous_routes, envoy.internal_redirect_predicates.safe_cross_scheme
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.network.dns_resolver: envoy.network.dns_resolver.cares
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.http.stateful_header_formatters: preserve_case
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.formatter: envoy.formatter.metadata, envoy.formatter.req_without_query
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.quic.server.crypto_stream: envoy.quic.crypto_stream.server.quiche
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.access_loggers: envoy.access_loggers.file, envoy.access_loggers.http_grpc, envoy.access_loggers.open_telemetry, envoy.access_loggers.stderr, envoy.access_loggers.stdout, envoy.access_loggers.tcp_grpc, envoy.access_loggers.wasm, envoy.file_access_log, envoy.http_grpc_access_log, envoy.open_telemetry_access_log, envoy.stderr_access_log, envoy.stdout_access_log, envoy.tcp_grpc_access_log, envoy.wasm_access_log
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.rbac.matchers: envoy.rbac.matchers.upstream_ip_port
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.client_ssl_auth, envoy.filters.network.connection_limit, envoy.filters.network.direct_response, envoy.filters.network.dubbo_proxy, envoy.filters.network.echo, envoy.filters.network.ext_authz, envoy.filters.network.http_connection_manager, envoy.filters.network.local_ratelimit, envoy.filters.network.mongo_proxy, envoy.filters.network.ratelimit, envoy.filters.network.rbac, envoy.filters.network.redis_proxy, envoy.filters.network.sni_cluster, envoy.filters.network.sni_dynamic_forward_proxy, envoy.filters.network.tcp_proxy, envoy.filters.network.thrift_proxy, envoy.filters.network.wasm, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.common.key_value: envoy.key_value.file_based
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.http.original_ip_detection: envoy.http.original_ip_detection.custom_header, envoy.http.original_ip_detection.xff
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.matching.input_matchers: envoy.matching.matchers.consistent_hashing, envoy.matching.matchers.ip
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.compression.compressor: envoy.compression.brotli.compressor, envoy.compression.gzip.compressor
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.guarddog_actions: envoy.watchdog.abort_action, envoy.watchdog.profile_action
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.dynamic_ot, envoy.tracers.lightstep, envoy.tracers.opencensus, envoy.tracers.skywalking, envoy.tracers.xray, envoy.tracers.zipkin, envoy.zipkin
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.filters.http: envoy.bandwidth_limit, envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.ext_proc, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.admission_control, envoy.filters.http.alternate_protocols_cache, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.bandwidth_limit, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cdn_loop, envoy.filters.http.composite, envoy.filters.http.compressor, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.decompressor, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.dynamo, envoy.filters.http.ext_authz, envoy.filters.http.ext_proc, envoy.filters.http.fault, envoy.filters.http.grpc_http1_bridge, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_json_transcoder, envoy.filters.http.grpc_stats, envoy.filters.http.grpc_web, envoy.filters.http.header_to_metadata, envoy.filters.http.health_check, envoy.filters.http.ip_tagging, envoy.filters.http.jwt_authn, envoy.filters.http.local_ratelimit, envoy.filters.http.lua, envoy.filters.http.oauth2, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.ratelimit, envoy.filters.http.rbac, envoy.filters.http.router, envoy.filters.http.set_metadata, envoy.filters.http.stateful_session, envoy.filters.http.tap, envoy.filters.http.wasm, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.local_rate_limit, envoy.lua, envoy.rate_limit, envoy.router, match-wrapper
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.matching.common_inputs: envoy.matching.common_inputs.environment_variable
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.upstreams: envoy.filters.connection_pools.tcp.generic
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.http.stateful_session: envoy.http.stateful_session.cookie
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.quic, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.starttls, envoy.transport_sockets.tap, envoy.transport_sockets.tcp_stats, envoy.transport_sockets.tls, envoy.transport_sockets.upstream_proxy_protocol, raw_buffer, starttls, tls
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.tls.cert_validator: envoy.tls.cert_validator.default, envoy.tls.cert_validator.spiffe
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.dubbo_proxy.route_matchers: default
[2022-03-30 12:32:55.136][32675][info][main] [source/server/server.cc:394]   envoy.thrift_proxy.filters: envoy.filters.thrift.header_to_metadata, envoy.filters.thrift.rate_limit, envoy.filters.thrift.router
[2022-03-30 12:32:55.152][32675][info][main] [source/server/server.cc:442] HTTP header map info:
[2022-03-30 12:32:55.154][32675][info][main] [source/server/server.cc:444]   request header map: 656 bytes: :authority,:method,:path,:protocol,:scheme,accept,accept-encoding,access-control-request-headers,access-control-request-method,authentication,authorization,cache-control,cdn-loop,connection,content-encoding,content-length,content-type,expect,grpc-accept-encoding,grpc-timeout,if-match,if-modified-since,if-none-match,if-range,if-unmodified-since,keep-alive,origin,pragma,proxy-connection,proxy-status,referer,te,transfer-encoding,upgrade,user-agent,via,x-client-trace-id,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-downstream-service-cluster,x-envoy-downstream-service-node,x-envoy-expected-rq-timeout-ms,x-envoy-external-address,x-envoy-force-trace,x-envoy-hedge-on-per-try-timeout,x-envoy-internal,x-envoy-ip-tags,x-envoy-max-retries,x-envoy-original-path,x-envoy-original-url,x-envoy-retriable-header-names,x-envoy-retriable-status-codes,x-envoy-retry-grpc-on,x-envoy-retry-on,x-envoy-upstream-alt-stat-name,x-envoy-upstream-rq-per-try-timeout-ms,x-envoy-upstream-rq-timeout-alt-response,x-envoy-upstream-rq-timeout-ms,x-envoy-upstream-stream-duration-ms,x-forwarded-client-cert,x-forwarded-for,x-forwarded-host,x-forwarded-proto,x-ot-span-context,x-request-id
[2022-03-30 12:32:55.154][32675][info][main] [source/server/server.cc:444]   request trailer map: 128 bytes: 
[2022-03-30 12:32:55.154][32675][info][main] [source/server/server.cc:444]   response header map: 432 bytes: :status,access-control-allow-credentials,access-control-allow-headers,access-control-allow-methods,access-control-allow-origin,access-control-expose-headers,access-control-max-age,age,cache-control,connection,content-encoding,content-length,content-type,date,etag,expires,grpc-message,grpc-status,keep-alive,last-modified,location,proxy-connection,proxy-status,server,transfer-encoding,upgrade,vary,via,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-degraded,x-envoy-immediate-health-check-fail,x-envoy-ratelimited,x-envoy-upstream-canary,x-envoy-upstream-healthchecked-cluster,x-envoy-upstream-service-time,x-request-id
[2022-03-30 12:32:55.154][32675][info][main] [source/server/server.cc:444]   response trailer map: 152 bytes: grpc-message,grpc-status
[2022-03-30 12:32:55.161][31854][warning][main] [source/server/server.cc:959] shutting down admin due to child startup
[2022-03-30 12:32:55.161][31854][warning][main] [source/server/server.cc:965] terminating parent process
[2022-03-30 12:32:55.174][32675][info][main] [source/server/server.cc:786] runtime: {}
[2022-03-30 12:32:55.176][32675][warning][main] [source/server/server.cc:643] No admin address given, so no admin HTTP server started.
[2022-03-30 12:32:55.177][32675][info][config] [source/server/configuration_impl.cc:127] loading tracing configuration
[2022-03-30 12:32:55.177][32675][info][config] [source/server/configuration_impl.cc:87] loading 0 static secret(s)
[2022-03-30 12:32:55.177][32675][info][config] [source/server/configuration_impl.cc:93] loading 2 cluster(s)
[2022-03-30 12:32:55.196][32675][info][config] [source/server/configuration_impl.cc:97] loading 2 listener(s)
[2022-03-30 12:32:55.197][32675][warning][misc] [source/common/network/utility.cc:759] GRO requested but not supported by the OS. Check OS config or disable prefer_gro.
[2022-03-30 12:32:55.206][32675][info][config] [source/server/configuration_impl.cc:109] loading stats configuration
[2022-03-30 12:32:55.206][32675][info][runtime] [source/common/runtime/runtime_impl.cc:462] RTDS has finished initialization
[2022-03-30 12:32:55.206][32675][info][upstream] [source/common/upstream/cluster_manager_impl.cc:209] cm init: all clusters initialized
[2022-03-30 12:32:55.206][32675][warning][main] [source/server/server.cc:758] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2022-03-30 12:32:55.208][32675][info][main] [source/server/server.cc:863] all clusters initialized. initializing init manager
[2022-03-30 12:32:55.208][32675][info][config] [source/server/listener_manager_impl.cc:788] all dependencies initialized. starting workers
[2022-03-30 12:32:55.208][32675][error][config] [source/server/listener_manager_impl.cc:598] final pre-worker listener init for listener 'tcp_listener' failed: cannot listen() errno=95
[2022-03-30 12:32:55.222][32675][info][main] [source/server/server.cc:882] starting main dispatch loop

I was able to reproduce the issue with envoy 1.20.0, and envoy built from the HEAD.

adisuissa commented 2 years ago

cc @mattklein123 @alyssawilk

mattklein123 commented 2 years ago

I think the issue is this function needs to be aware of TCP vs. UDP:

https://github.com/envoyproxy/envoy/blob/ff24017783df8cb0f342d7532b565bd0cd54f106/source/server/hot_restarting_parent.cc#L111

I think TCP vs. UDP is being passed across the hot restart RPC but AFAICT it's being stripped before doing the actual lookup.

jparklab commented 2 years ago

@mattklein123 I think you are right. I no longer see the issue when I change the if statement as below to compare socket type

    if (*socket_factory.localAddress() == *addr && listener.get().bindToPort() &&
        (socket_factory.socketType() == Network::Socket::Type::Stream) ==
            Network::Utility::urlIsTcpScheme(request.pass_listen_socket().address())) {
soulxu commented 2 years ago

@jparklab are you going to fix this? if not, I happy to help here

jparklab commented 2 years ago

@soulxu I will create a PR with a fix to check the socket type. I'll tag you for the review

jparklab commented 2 years ago

@soulxu I created a PR: https://github.com/envoyproxy/envoy/pull/20605, can you take a look?

mattklein123 commented 2 years ago

Fixed by https://github.com/envoyproxy/envoy/pull/20605