Incorrect SNI set for different endpoints that live on the same host

sebas2day commented 2 years ago

Title: Incorrect SNI set for different endpoints that live on the same host

Description: We have Envoy proxying requests to endpoints using a request header. All proxied requests need to use TLS. Endpoints share the same certificate *.example.com.

Our setup consist of a single cluster with STRICT_DNS having multiple lb_endpoints where they are selected using subset_selectors which is based on the request header. In order to get the correct SNI being set we can't use hostname on the cluster endpoint because auto_sni is based on the downstream host header. This means we need to construct the host header using a Lua filter. Making a request sets the correct SNI to the endpoint but when we make another request to a different endpoint (that lives on the same host) it somehow reuses the SNI of the initial request. Observing the logs it does seem it's establishing new connections (so not reusing them?). This results in 421 HTTP responses in our setup.

Reproduction scenario:

node:
  id: envoy_example
  cluster: envoy_example
static_resources:
  listeners:
    - name: envoy_proxy
      address:
        socket_address:
          address: '0.0.0.0'
          port_value: '8080'
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: test
                route_config:
                  name: route_configuration
                  virtual_hosts:
                    - name: envoy_host
                      domains: [ "*" ]
                      routes:
                        - name: some_route
                          match:
                            prefix: "/"
                          route:
                            cluster: "example_application"
                http_filters:
                  - name: envoy.filters.http.lua
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
                      inline_code: |
                        function envoy_on_request(request_handle)
                          local service = request_handle:headers():get("service")
                          request_handle:headers():replace(":authority", service .. ".example.com")
                        end
                  - name: envoy.filters.http.header_to_metadata
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.header_to_metadata.v3.Config
                      request_rules:
                        - header: "service"
                          on_header_present:
                            metadata_namespace: envoy.lb
                            key: service
                            type: STRING
                          remove: false
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
                      suppress_envoy_headers: true
  clusters:
    - name: "example_application"
      type: STRICT_DNS
      connect_timeout: 1s
      load_assignment:
        cluster_name: "example_application"
        endpoints:
          lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: "service1.example.com"
                    port_value: 8002
                hostname: "service1.example.com"
              metadata:
                filter_metadata:
                  envoy.lb:
                    service: "service1"
            - endpoint:
                address:
                  socket_address:
                    address: "service2.example.com"
                    port_value: 8002
                hostname: "service2.example.com"
              metadata:
                filter_metadata:
                  envoy.lb:
                    service: "service2"
      lb_subset_config:
        fallback_policy: NO_FALLBACK
        subset_selectors:
          - keys: [ "service" ]
            single_host_per_subset: true
      transport_socket:
        name: envoy.transport_sockets.tls
      upstream_http_protocol_options:
        auto_sni: true

Nginx

user  nginx;
worker_processes  auto;

pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  text/plain;

    keepalive_timeout  65;

    server {
        listen 8002 ssl;
        server_name service1.example.com;

        ssl_certificate /ssl/cert.pem;
        ssl_certificate_key /ssl/key.pem;

        location / {
            return 200 'service1 sni: "$ssl_server_name" host: "$http_host"';
        }
    }

    server {
        listen 8002 ssl;
        server_name service2.example.com;

        ssl_certificate /ssl/cert.pem;
        ssl_certificate_key /ssl/key.pem;

        location / {
            return 200 'service2 sni: "$ssl_server_name" host: "$http_host"';
        }
    }
}

Hosts

172.26.0.3  service1.example.com
172.26.0.3  service2.example.com

Run

mkdir -p ssl
openssl req -x509 -nodes -newkey rsa:4096 -keyout $PWD/ssl/key.pem -out $PWD/ssl/cert.pem -sha256 -subj "/C=US/ST=Oregon/L=Portland/O=Company Name/OU=Org/CN=*.example.com" -days 360

docker network create testenvoy --subnet 172.26.0.0/16 --gateway 172.26.0.1
docker run --rm -d -p 8080:8080 --name envoy --net testenvoy -v $PWD/envoy.yaml:/etc/envoy/envoy.yaml:ro -v $PWD/hosts:/etc/hosts:ro --ip 172.26.0.2 envoyproxy/envoy-alpine:v1.20.0 envoy --config-path /etc/envoy/envoy.yaml --log-level debug
docker run --rm -d -p 8002:8002 --name application --net testenvoy -v $PWD/nginx.conf:/etc/nginx/nginx.conf:ro -v $PWD/ssl:/ssl --ip 172.26.0.3 nginx

curl -sS localhost:8080/test -H 'service:service1'
curl -sS localhost:8080/test -H 'service:service2'

Last call will show the SNI of the previous call.

Shikugawa commented 2 years ago

Maybe this problem has been caused by the wrong TLS session management on the same IP address with my quick investigation. (not reached to the root cause of this problem. So we need further investigation.) https://github.com/envoyproxy/envoy/blob/351c0ca82e28e19750102cfc1beb5eca8c4f2542/source/extensions/transport_sockets/tls/context_impl.cc#L666-L669

lizan commented 2 years ago

In this case you should configure different cluster for those services, cluster is a collection of endpoint of same logical services, in this case it doesn't seems to be that. What's the reason for those endpoints in same cluster?

sebas2day commented 2 years ago

In this case you should configure different cluster for those services, cluster is a collection of endpoint of same logical services, in this case it doesn't seems to be that. What's the reason for those endpoints in same cluster?

service is just used as an example here just like that this example looks all static. In reality we have a dynamic number of clusters for each application. So the cluster with lb_endpoints represents one application. We actually tried making a separate cluster for each possible value service can have, but this causes a different problem, because service can have around 3000 different values you get 3000 clusters for a single application. The startup time of Envoy drastically slows down doing this and the amount of allocated memory would run into many GBs. It works, but it's undesirable.

Using lb_endpoints seemed like a good solution but I don't understand why it reuses the SNI of the previous call even though the host header is actually different.

Shikugawa commented 2 years ago

I'm not sure this problem should be resolved from code level. But, at all, this problem has been caused by TLS session resumption on the same IP address. So it will be resolved by configuring max_session_keys=0, which disables TLS session resumption by default. https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/tls.proto#envoy-v3-api-msg-extensions-transport-sockets-tls-v3-upstreamtlscontext

sebas2day commented 2 years ago

max_session_keys=0 indeed resolves the issue. I'm not sure if disabling TLS session resumption would cause possible performance issues for clients? For now, I tested this locally on my machine and don't see any real difference. Probably need to verify this on deployed environments

lizan commented 2 years ago

@Shikugawa this is not related to TLS session resumption at all. It is HTTP connection pool management.

because service can have around 3000 different values you get 3000 clusters for a single application.

If they are the same application, why would they return 421 even the endpoint have the capacity to respond the the request?

Using lb_endpoints seemed like a good solution but I don't understand why it reuses the SNI of the previous call even though the host header is actually different.

HTTPS allow us to reuse connection as much as possible for the efficiency. Even browsers do send requests over same HTTPS connection even the host header is different as long as they resolves to same IP and certificate matches. 421 indicates that request should be retried (and that's why 421 exists). See https://github.com/envoyproxy/envoy/issues/6767#issuecomment-488811660 and that's the issue for retry behavior.

Shikugawa commented 2 years ago

@lizan As described in #6767, the unacceptable behavior is as follows. If we have two origin servers, they have different certificates (in rfc7540, one has *.example.com in SAN field, the other has a.example.com in SAN field). But in this case, two origins share same wildcard certificate. As far as I know, connection reuse may occur if the request has the same IP address and hostname in the case of HTTP/2 with TLS, and it is acceptable if a presented certificate from the origin is valid. In this case, all the conditions to the reuse connection are satisfied. So I think the behavior in this case is following with HTTP/2 spec. This is why I considered this problem is not to originate from HTTP connection management. I couldn't catch where the 421 came from, but, according to the spec, it should be responded from origin server.

lizan commented 2 years ago

@sebas2day Back to your original issue, I think using dynamic forward proxy might be the fastest way to resolve it without configuring all clusters. That might be more suitable for your use case since it resolves DNS and treat every endpoint differently by hostname.

sebas2day commented 2 years ago

@lizan

Back to your original issue, I think using dynamic forward proxy might be the fastest way to resolve it without configuring all clusters. That might be more suitable for your use case since it resolves DNS and treat every endpoint differently by hostname.

That's actually what we tried initially but interesting enough you'll get the exact same issue. That's when we started to investigate different configurations to fix the issue without avail unfortunately.

Please have a look at the following config along with the setup I described in the initial post:

node:
  id: envoy_example
  cluster: envoy_example
static_resources:
  listeners:
    - name: envoy_proxy
      address:
        socket_address:
          address: '0.0.0.0'
          port_value: '8080'
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: test
                route_config:
                  name: route_configuration
                  virtual_hosts:
                    - name: envoy_host
                      domains: [ "*" ]
                      routes:
                        - name: some_route
                          match:
                            prefix: "/"
                          route:
                            cluster: "example_application"
                          typed_per_filter_config:
                            envoy.filters.http.dynamic_forward_proxy:
                              "@type": type.googleapis.com/envoy.extensions.filters.http.dynamic_forward_proxy.v3.PerRouteConfig
                              host_rewrite_header: ':destination'
                http_filters:
                  - name: envoy.filters.http.lua
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
                      inline_code: |
                        function envoy_on_request(request_handle)
                          local service = request_handle:headers():get("service")
                          request_handle:headers():replace(":destination", service .. ".example.com:8002")
                        end
                  - name: envoy.filters.http.dynamic_forward_proxy
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.dynamic_forward_proxy.v3.FilterConfig
                      dns_cache_config:
                        name: dynamic_forward_proxy_cache_config
                        dns_lookup_family: V4_ONLY
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
                      suppress_envoy_headers: true
  clusters:
    - name: "example_application"
      connect_timeout: 1s
      lb_policy: CLUSTER_PROVIDED
      cluster_type:
        name: envoy.clusters.dynamic_forward_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
          dns_cache_config:
            name: dynamic_forward_proxy_cache_config
            dns_lookup_family: V4_ONLY
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          common_tls_context:
            validation_context:
              trusted_ca:
                filename: /etc/ssl/certs/ca-certificates.crt
              trust_chain_verification: ACCEPT_UNTRUSTED

sebas2day commented 2 years ago

@Shikugawa

But in this case, two origins share same wildcard certificate. As far as I know, connection reuse may occur if the request has the same IP address and hostname in the case of HTTP/2 with TLS, and it is acceptable if a presented certificate from the origin is valid. In this case, all the conditions to the reuse connection are satisfied.

I'm curious why the resolved IP address (that ends up being the same) for both calls matters and ends up in different behavior? For me, as a user, I ideally don't want the think about what endpoint lives on what host. I would expect that when I explicitly state endpoints with an explicit SNI it should not attempt to reuse the same connection but make a separate connection for each endpoint instead. Calls to the endpoint can than reuse their dedicated connection.

Shikugawa commented 2 years ago

@sebas2day Let's go back to the first discussion. In the current configuration, Upstream Connection is supposed to use HTTP/1.1 and not reuse the connection. So there is no problem because each endpoint is using a different connection. I think this problem is caused by Envoy's problem with certificate validation when reusing TLS sessions. Therefore, the problem can be solved by not reusing the session.

sebas2day commented 2 years ago

Sorry I'm mixing up connections with TLS sessions. I checked but the issue is regardless of whether it's http2 or when they have different certificates without having a wildcard.

I think this problem is caused by Envoy's problem with certificate validation when reusing TLS sessions. Therefore, the problem can be solved by not reusing the session.

To me having TLS session resumption sounds like a good thing to have, but I think I want it to be per endpoint and not per host? Disabling it sounds to me like a workaround rather than a fix. Please correct me if I'm wrong since my knowledge in this area is quite limited.

Shikugawa commented 2 years ago

@PiotrSikora I don't know the details of this implementation, but I think the current implementation is to reuse the session ticket in the connection as long as it exists. Does it make sense to do SNI-based ticket selection here? https://github.com/envoyproxy/envoy/blob/v1.20.0/source/extensions/transport_sockets/tls/context_impl.cc#L648-L672

lizan commented 2 years ago

@Shikugawa again this is not related to TLS session ticket in any way. If the DNS resolves to same IP address, it will reuse the connection even it is HTTP/1.1. That matches browser's behavior as well.

Shikugawa commented 2 years ago

Ok. I will do further investigation. But, we don't have any approach to avoid this problem without disabling session resumption in the current implementation. It is just a workaround.

Shikugawa commented 2 years ago

@lizan This is the trace result. (Both DFP and non-DFP show almost the same result.) https://gist.github.com/Shikugawa/537f3df4fe9c58f20ebdcf94c1ca1952 With my investigation, the current config doesn't reuse the previous HTTP connection... You can find out actual logs here. As for what you said, SNI-based connection reuse should be implemented. But I also think that is not the root cause of this problem...

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

envoyproxy / envoy

Incorrect SNI set for different endpoints that live on the same host #18897