envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.74k stars 4.75k forks source link

Error when using TLS min version 1.3 with Envoy Proxy #36181

Open joel-vaz opened 2 days ago

joel-vaz commented 2 days ago

Hello,

I'm trying to update my service mesh (Consul - Envoy) to use TLS minimum version 1.3 on my cluster, updating from version 1.2.

I confirmed that both the Consul server and Consul agent are correctly configured to use the minimum version of TLS 1.3, but the Envoy proxy that I use as a sidecar for my services is in an unhealthy status with the log:

DeltaAggregatedResources gRPC config stream to local_agent closed since 97s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION

Consul Agent Configuration:

{
  "acl": {
    "enabled": true,
    "down_policy": "async-cache",
    "default_policy": "deny",
    "tokens": {
      "default": ""
    }
  },
  "enable_central_service_config": false,
  "datacenter": "",
  "encrypt": "",
  "encrypt_verify_incoming": true,
  "encrypt_verify_outgoing": true,
  "server": false,
  "log_level": "INFO",
  "advertise_addr": "",
  "bind_addr": "0.0.0.0",
  "client_addr": "0.0.0.0",
  "data_dir": "/consul/data",
  "retry_join": [
    ""
  ],
  "auto_encrypt": {
    "tls": true,
    "ip_san": [
      ""
    ]
  },
  "tls": {  
    "defaults": {
      "ca_file": "/consul/ca.pem",
      "verify_outgoing": true,
      "verify_incoming": false,
      "tls_min_version": "TLSv1_3"
    },
    "internal_rpc": {
      "verify_server_hostname": true
    }
  },
  "leave_on_terminate": true,
  "ports": {
    "https": 8501,
    "http": -1,
    "grpc": 8502,
    "grpc_tls": 8503
  },
  "domain": "consul",
  "node_meta": {
    "env": "",
    "version": ""
  }
}

Envoy Service Configuration:

{
  "service": {
    "name": "",
    "id": "",
    "token": "",
    "address": "",
    "port": 0,
    "meta": {
      "env": "",
      "version": ""
    },
    "check": {
      "deregister_critical_service_after": "30m",
      "http": "",
      "method": "GET",
      "interval": "",
      "timeout": ""
    },
    "connect": {
      "sidecar_service": {
        "port": 21000,
        "checks": [
          {
            "name": "Connect Envoy Sidecar",
            "tcp": "",
            "interval": "10s"
          },
          {
            "id": "",
            "alias_service": ""
          }
        ],
        "proxy": {
          "config": {
            "envoy_stats_bind_addr": "0.0.0.0:19001",
            "envoy_tracing_json": "{\"http\":{\"name\":\"envoy.tracers.datadog\",\"typedConfig\":{\"@type\":\"type.googleapis.com/envoy.config.trace.v3.DatadogConfig\",\"collector_cluster\":\"datadog_8126\",\"service_name\":\"%NAME%\"}}}",
            "envoy_extra_static_clusters_json": "{\"connect_timeout\":\"3.000s\",\"dns_lookup_family\":\"V4_ONLY\",\"lb_policy\":\"ROUND_ROBIN\",\"load_assignment\":{\"cluster_name\":\"datadog_8126\",\"endpoints\":[{\"lb_endpoints\":[{\"endpoint\":{\"address\":{\"socket_address\":{\"address\":\"%ADDRESS%\",\"port_value\":8126,\"protocol\":\"TCP\"}}}}]}]},\"name\":\"datadog_8126\",\"type\":\"STRICT_DNS\"}"
          },
          "upstreams": []
        }
      }
    }
  }
}

Can I get some help on this issue, please? Did anyone go through the same? 🙏

Kind Regards,
Joel Vaz

zuercher commented 2 days ago

Envoy's default max TLS version for clients is TLS 1.2 by default (https://www.envoyproxy.io/docs/envoy/v1.31.1/api-v3/extensions/transport_sockets/tls/v3/common.proto). In your Envoy configuration, you can specify that it to be TLS 1.3 and the connection should succeed.

The docs I linked are for the latest version, but that's true in Envoy 1.26 as well. That version is no longer supported. You should upgrade.

joel-vaz commented 2 days ago

Hello @zuercher first thanks for your quick reply!

Secondly, could you help me understand where to configure that parameter, my Envoy configuration is generated from the consul connect envoy command snippet bellow this file is part of our docker container entrypoint script and uses the service configuration I sent above as a base template to generate the Envoy configuration with the correct env variables bellow:

set_proxy_configuration()
{
  ## Env variables code
  ##

  base_renderers=$(jq '.service.connect.sidecar_service.proxy.upstreams = '"${CONSUL_SERVICE_UPSTREAMS}"' |
      .service.name = "'${SERVICE_NAME}'" |
      .service.id = "'${SERVICE_ID}'" |
      .service.token = "'${CONSUL_HTTP_TOKEN}'" |
      .service.address = "'${CONTAINER_IP}'" |
      .service.port = '${SERVICE_PORT}' |
      .service.meta.env = "'${DD_ENV}'" |
      .service.meta.version = "'${DD_VERSION}'" |
      .service.connect.sidecar_service.port = '${SIDECAR_PORT}' |
      .service.check.http = "'${SERVICE_HEALTH_CHECK}'" |
      .service.check.interval = "'${SERVICE_HEALTH_CHECK_INTERVAL}'" |
      .service.check.timeout = "'${SERVICE_HEALTH_CHECK_TIMEOUT}'" |
      .service.connect.sidecar_service.checks[0].tcp = "'${SIDECAR_HEALTH_CHECK}'" |
      .service.connect.sidecar_service.checks[1].id = "'${SERVICE_ID}'-alias" |
      .service.connect.sidecar_service.checks[1].alias_service = "'${SERVICE_ID}'" |
      .service.connect.sidecar_service.proxy.config.envoy_tracing_json |=gsub("%NAME%";"'$DD_SERVICE'") |
      .service.connect.sidecar_service.proxy.config.envoy_extra_static_clusters_json |=gsub("%ADDRESS%";"'$EC2_HOST_ADDRESS'")' ./service_config.json )
echo "Base Renderers configuration: $base_renderers"

  # Wait until Consul can be contacted
  until curl -s -k ${CONSUL_HTTP_ADDR}/v1/status/leader | grep ***; do
    echo "Waiting for Consul to start at ${CONSUL_HTTP_ADDR}."
    sleep 1
  done

  echo "Registering service with consul ${SERVICE_CONFIG_FILE}."
  consul services register ${SERVICE_CONFIG_FILE}

  consul connect envoy -sidecar-for=${SERVICE_ID} -grpc-ca-file=${CONSUL_CACERT} $ENVOY_DEBUG &
}

I've been trying for a few days to enforce the envoy to use the TLS min version 1.3, including going through the documentation you sent above but to no effect. I always end up with the TLS version miss match.

Concerning the version, we plan to move to a stable release of envoy, but we are doing incremental upgrades since this service was left without update for a few years.

Kind Regards, Joel Vaz

zuercher commented 2 days ago

Ultimately, the Envoy configuration must have a static cluster defined which species how to connect to the XDS server (provided by consul). That Envoy cluster definition will have a transport_socket field configured with an UpstreamTlsContext object. Within that UpstreamTlsContext you'll need to make sure that common_tls_context.tls_params.tls_maximum_protocol_version is set to TLSv1_3. How to get consul connect envoy to generate that config is a question for a consul forum.

joel-vaz commented 13 hours ago

Hey @zuercher

I updated my service configuration file for the envoy proxy:

  "service": {
    "name": <value_hidden>,
    "id": <value_hidden>,
    "token": <value_hidden>,
    "address": <value_hidden>,
    "port": <value_hidden>,
    "meta": {
      "env": <value_hidden>,
      "version": <value_hidden>
    },
    "check": {
      "deregister_critical_service_after": "30m",
      "http": <value_hidden>,
      "method": "GET",
      "interval": "1s",
      "timeout": "1s"
    },
    "connect": {
      "sidecar_service": {
        "port": <value_hidden>,
        "checks": [
          {
            "name": "Connect Envoy Sidecar",
            "tcp": <value_hidden>,
            "interval": "10s"
          },
          {
            "id": "<value_hidden>",
            "alias_service": "<value_hidden>"
          }
        ],
        "proxy": {
          "config": {
            "envoy_stats_bind_addr": "<value_hidden>",
            "envoy_tracing_json": "{\"http\":{\"name\":\"envoy.tracers.datadog\",\"typedConfig\":{\"@type\":\"type.googleapis.com/envoy.config.trace.v3.DatadogConfig\",\"collector_cluster\":\"datadog_8126\",\"service_name\":\"<value_hidden>\"}}}",
            "envoy_extra_static_clusters_json": "{\"connect_timeout\":\"3.000s\",\"dns_lookup_family\":\"V4_ONLY\",\"lb_policy\":\"ROUND_ROBIN\",\"load_assignment\":{\"cluster_name\":\"datadog_8126\",\"endpoints\":[{\"lb_endpoints\":[{\"endpoint\":{\"address\":{\"socket_address\":{\"address\":\"<value_hidden>\",\"port_value\":<value_hidden>,\"protocol\":\"TCP\"}}}}]}]},\"name\":\"datadog_8126\",\"type\":\"STRICT_DNS\"}",
            "common_tls_context": {
              "tls_params": {
                "tls_minimum_protocol_version": "TLSv1_3"
              }
            }
          },
          "upstreams": []
        }
      }
    }
  }
}

Still got the same error with Envoy defaulting to TLS 1.2. Is there something missing on the Envoy service configuration file? 🤔

Additional logs:

[2024-09-19 09:51:55.555][37][info][main] [source/server/server.cc:456] HTTP header map info:
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   request header map: 672 bytes: :authority,:method,:path,:protocol,:scheme,accept,accept-encoding,access-control-request-headers,access-control-request-method,access-control-request-private-network,authentication,authorization,cache-control,cdn-loop,connection,content-encoding,content-length,content-type,expect,grpc-accept-encoding,grpc-timeout,if-match,if-modified-since,if-none-match,if-range,if-unmodified-since,keep-alive,origin,pragma,proxy-connection,proxy-status,referer,te,transfer-encoding,upgrade,user-agent,via,x-client-trace-id,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-downstream-service-cluster,x-envoy-downstream-service-node,x-envoy-expected-rq-timeout-ms,x-envoy-external-address,x-envoy-force-trace,x-envoy-hedge-on-per-try-timeout,x-envoy-internal,x-envoy-ip-tags,x-envoy-is-timeout-retry,x-envoy-max-retries,x-envoy-original-path,x-envoy-original-url,x-envoy-retriable-header-names,x-envoy-retriable-status-codes,x-envoy-retry-grpc-on,x-envoy-retry-on,x-envoy-upstream-alt-stat-name,x-envoy-upstream-rq-per-try-timeout-ms,x-envoy-upstream-rq-timeout-alt-response,x-envoy-upstream-rq-timeout-ms,x-envoy-upstream-stream-duration-ms,x-forwarded-client-cert,x-forwarded-for,x-forwarded-host,x-forwarded-port,x-forwarded-proto,x-ot-span-context,x-request-id
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   request trailer map: 120 bytes:
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   response header map: 432 bytes: :status,access-control-allow-credentials,access-control-allow-headers,access-control-allow-methods,access-control-allow-origin,access-control-allow-private-network,access-control-expose-headers,access-control-max-age,age,cache-control,connection,content-encoding,content-length,content-type,date,etag,expires,grpc-message,grpc-status,keep-alive,last-modified,location,proxy-connection,proxy-status,server,transfer-encoding,upgrade,vary,via,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-degraded,x-envoy-immediate-health-check-fail,x-envoy-ratelimited,x-envoy-upstream-canary,x-envoy-upstream-healthchecked-cluster,x-envoy-upstream-service-time,x-request-id
[2024-09-19 09:51:55.556][37][info][main] [source/server/server.cc:459]   response trailer map: 144 bytes: grpc-message,grpc-status
[2024-09-19 09:51:55.562][37][info][main] [source/server/server.cc:827] runtime: layers:
  - name: base
    static_layer:
      re2.max_program_size.error_level: 1048576
[2024-09-19 09:51:55.562][37][info][admin] [source/server/admin/admin.cc:66] admin address: 127.0.0.1:19000
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:131] loading tracing configuration
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:142]   validating default server-wide tracing driver: envoy.tracers.datadog
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:91] loading 0 static secret(s)
[2024-09-19 09:51:55.563][37][info][config] [source/server/configuration_impl.cc:97] loading 3 cluster(s)
[2024-09-19 09:51:55.612][37][info][config] [source/server/configuration_impl.cc:101] loading 1 listener(s)
[2024-09-19 09:51:55.614][37][info][config] [source/server/configuration_impl.cc:113] loading stats configuration
[2024-09-19 09:51:55.614][37][info][runtime] [source/common/runtime/runtime_impl.cc:463] RTDS has finished initialization
[2024-09-19 09:51:55.614][37][info][upstream] [source/common/upstream/cluster_manager_impl.cc:221] cm init: initializing cds
[2024-09-19 09:51:55.614][37][warning][main] [source/server/server.cc:802] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2024-09-19 09:51:55.615][37][info][main] [source/server/server.cc:923] starting main dispatch loop
[2024-09-19 09:52:48.043][37][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 52s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION

Thank you so much for the help you are providing 🙏