Potential Bug in Internal Listener's Handling of Filter Metadata

agrawroh commented 4 weeks ago

Description

We're leveraging Envoy to establish an mTLS tunnel, following a setup quite similar to the standard configuration. Additionally, we're utilizing the Set-Filter State HTTP Filter to transmit the XFCC from the outer listener to the internal listener. On the internal listener, we apply a Header Mutation filter to add new headers, using expressions like %FILTER_STATE(tunneling.x-real-ip:PLAIN)%, to access the state data passed by the outer/main listener.

However, we've observed a race condition under high load, where there is a discrepancy between the metadata set by the outer listener and what the internal listener receives. It appears that the states from different requests are getting mixed up, leading to a situation where a request made with Certificate A arrives at the internal listener, but after reading the metadata, the XFCC header is added using Certificate B.

Notes:

We have also tried to limit the number of connection to make only a single request but it didn't help.

common_http_protocol_options:
  max_requests_per_connection: 1
  max_stream_duration: 3600s

Repro Steps

You'll have to generate the cert files and then start both the client & server Envoys:

envoy --use-dynamic-base-id --log-level debug --config-path server.yaml
envoy --use-dynamic-base-id --config-path client.yaml

Start the Python Server for logging:

python3 pyhttp.py --port 1234 &> 1234.log
python3 pyhttp.py --port 1235 &> 1235.log

Now, if you make a ton of requests like this:

curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py & curl http://127.0.0.1:9997/foo & curl http://127.0.0.1:9998/pyhttp.py &

And grep the logs from any one of the Python Servers (we have two and each one should only have same type of requests), you'll see that some requests have mismatched values:

x-forwarded-client-cert: Hash=3462945ab417c19...
x-forwarded-client-cert: Hash=c9894c243b2538d...
x-forwarded-client-cert: Hash=c9894c243b2538d...
x-forwarded-client-cert: Hash=3462945ab417c19...

Configuration

Simple Server:

#!/usr/bin/env python3

import http.server as SimpleHTTPServer
import socketserver as SocketServer
import logging
import argparse

class GetHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):

    def do_GET(self):
      logging.error(self.headers)
      SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self)

parser = argparse.ArgumentParser()
parser.add_argument('--port', type=int, default=1234)
args = parser.parse_args()

Handler = GetHandler
httpd = SocketServer.TCPServer(("127.0.0.1", args.port), Handler)
httpd.allow_reuse_address = True

httpd.serve_forever()

Server Envoy Configuration:

admin:
  address:
    socket_address: { address: 127.0.0.1, port_value: 9901 }
bootstrap_extensions:
- name: envoy.bootstrap.internal_listener
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.bootstrap.internal_listener.v3.InternalListener
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 127.0.0.1, port_value: 9999 }
    access_log:
      - name: envoy.access_loggers.file
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
          path: /logs/envoy-server-access.log
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          access_log:
            - name: envoy.access_loggers.file
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                path: /logs/envoy-server-access.log
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  path: /dp-cp-tls-tunnel
                  headers:
                  - name: ":method"
                    exact_match: POST
                route:
                  cluster: internal_listener_cluster
                  upgrade_configs:
                  - upgrade_type: CONNECT
                    connect_config:
                      allow_post: true
              - match: { prefix: "/" }
                route:
                  cluster: some_other_service
          http_filters:
          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.set_filter_state.v3.Config
              on_request_headers:
              - object_key: internal_listener_cluster.x-forwarded-client-cert
                factory_key: envoy.string
                format_string:
                  text_format_source:
                    inline_string: "%REQ(x-forwarded-client-cert)%"
                  omit_empty_values: true
                shared_with_upstream: ONCE
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          http_protocol_options: {}
          forward_client_cert_details: SANITIZE_SET
          set_current_client_cert_details:
            cert: true
          upgrade_configs:
          - upgrade_type: CONNECT
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          require_client_certificate: true
          common_tls_context:
            tls_certificates:
            - certificate_chain:
                filename: /certs/envoy/2022-11-18-leaf-int
              private_key:
                filename: /certs/envoy/2022-11-18-leaf-key
            validation_context:
              trusted_ca:
                filename: /certs/envoy/2022-11-18-root
  - name: listener_1
    address:
      socket_address: { address: 127.0.0.1, port_value: 10000 }
    access_log:
      - name: envoy.access_loggers.file
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
          path: /logs/envoy-access.log
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          access_log:
            - name: envoy.access_loggers.file
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                path: /logs/envoy-access.log
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/aaa" }
                direct_response:
                  status: 200
                  body:
                    inline_string: "Hello World.\n"
              - match: { prefix: "/" }
                route: { cluster: some_service }
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          require_client_certificate: true
          common_tls_context:
            tls_certificates:
            - certificate_chain:
                filename: /certs/envoy/2022-11-18-leaf-int
              private_key:
                filename: /certs/envoy/2022-11-18-leaf-key
            validation_context:
              trusted_ca:
                filename: /certs/envoy/2022-11-18-root
  - name: my_internal_listener
    internal_listener: {}
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          access_log:
            - name: envoy.access_loggers.file
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                path: /logs/envoy-server-internal-access.log
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/bbb" }
                direct_response:
                  status: 200
                  body:
                    inline_string: "Hello World!!\n"
              - match: { prefix: "/foo" }
                route: { cluster: some_other_service }
              - match: { prefix: "/" }
                route: { cluster: some_service }
          http_filters:
          - name: envoy.filters.http.set_filter_state
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.header_mutation.v3.HeaderMutation
              mutations:
                request_mutations:
                - append:
                    header:
                      key: x-forwarded-client-cert
                      value: "%FILTER_STATE(internal_listener_cluster.x-forwarded-client-cert:PLAIN)%"
                    append_action: OVERWRITE_IF_EXISTS_OR_ADD
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
  - name: some_service
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: some_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 1234
  - name: some_other_service
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: some_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 1235
  - name: internal_listener_cluster
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.internal_upstream.v3.InternalUpstreamTransport
        transport_socket:
          name: "envoy.transport_sockets.raw_buffer"
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.raw_buffer.v3.RawBuffer
    load_assignment:
      cluster_name: internal_listener_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              envoy_internal_address:
                server_listener_name: my_internal_listener

Client Envoy Configuration:

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        protocol: TCP
        address: 127.0.0.1
        port_value: 9998
    filter_chains:
    - filters:
      - name: tcp
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          stat_prefix: tcp_stats
          cluster: "cluster_0"
          tunneling_config:
            hostname: localhost:443
            use_post: true
            post_path: /dp-cp-tls-tunnel
            headers_to_add:
            - header:
                key: foo1
                value: bar1
  - name: listener_1
    address:
      socket_address:
        protocol: TCP
        address: 127.0.0.1
        port_value: 9997
    filter_chains:
    - filters:
      - name: tcp
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          stat_prefix: tcp_stats
          cluster: "cluster_1"
          tunneling_config:
            hostname: localhost:443
            use_post: true
            post_path: /dp-cp-tls-tunnel
            headers_to_add:
            - header:
                key: foo1
                value: bar1
  clusters:
  - name: cluster_0
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          tls_certificates:
            certificate_chain:
              filename: /certs/envoy/2022-11-18-leaf2-int
            private_key:
              filename: /certs/envoy/2022-11-18-leaf2-key
    connect_timeout: 5s
    # This ensures HTTP/1.1 CONNECT is used for establishing the tunnel.
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http_protocol_options: {}
    load_assignment:
      cluster_name: cluster_0
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 9999
  - name: cluster_1
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          tls_certificates:
            certificate_chain:
              filename: /certs/envoy/2022-11-18-leaf3-int
            private_key:
              filename: /certs/envoy/2022-11-18-leaf3-key
    connect_timeout: 5s
    # This ensures HTTP/1.1 CONNECT is used for establishing the tunnel.
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http_protocol_options: {}
    load_assignment:
      cluster_name: cluster_1
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 9999

wbpcode commented 4 weeks ago

@kyessenov

kyessenov commented 4 weeks ago

This is because connections to the internal listener are pooled since the first listener is HTTP. So when you get two HTTP connections to the first listener, they share the connection upstream to the internal listener, and the first one to make that connection gets the filter state. Remember, the transfer of the data happens at TCP level, not HTTP. If you want HTTP level transfer, the best option is to inject a header, and then parse it in the internal listener. Another alternative, is to implement hash() method on the transferred filter state object that will create distinct connection pools to the internal listener based on the value of the hash. Unfortunately, there's no out-of-the-box way right now, envoy.string doesn't implement it, but I think we can fix it and add another envoy.hashable_string that supports a unique hash.

agrawroh commented 4 weeks ago

This is because connections to the internal listener are pooled since the first listener is HTTP. So when you get two HTTP connections to the first listener, they share the connection upstream to the internal listener, and the first one to make that connection gets the filter state. Remember, the transfer of the data happens at TCP level, not HTTP. If you want HTTP level transfer, the best option is to inject a header, and then parse it in the internal listener. Another alternative, is to implement hash() method on the transferred filter state object that will create distinct connection pools to the internal listener based on the value of the hash. Unfortunately, there's no out-of-the-box way right now, envoy.string doesn't implement it, but I think we can fix it and add another envoy.hashable_string that supports a unique hash.

@kyessenov Thank you for the insights. I have a couple of quick questions:

I’ve set max_requests_per_connection to 1 on my upstream cluster, which points to the internal listener, with the expectation that each connection would only serve a single request. Is this correct?
You mentioned that for HTTP-level transfer, injecting and parsing headers is the best option. Could you provide any pointers or examples on how to pass a list of headers from the outer listener to the internal listener?

kyessenov commented 4 weeks ago

It may be close but the upstream is autonomous in Envoy, so there may be some race going on that the connection that initiated an upstream connection is not the one that uses it.
It's just standard header mutation rules. There are several ways, and I think there is even standard XFCC support in HCM.

agrawroh commented 4 weeks ago

"It's just standard header mutation rules. There are several ways, and I think there is even standard XFCC support in HCM." - Sorry, I was trying to understand how to pass the headers from outer listener to the internal listener. The only option I could find to do a state transfer is using metadata (Dynamic Metadata and State Filter).

kyessenov commented 4 weeks ago

I mean both terminating and internal listeners are HTTP - you can just decode it per-request and pass data this way.

agrawroh commented 4 weeks ago

I mean both terminating and internal listeners are HTTP - you can just decode it per-request and pass data this way.

In this case the outer listener is getting the encapsulated HTTPS traffic from the client and sending it to the internal listener which then decapsulates it:

route:
  cluster: internal_listener_cluster
  upgrade_configs:
  - upgrade_type: CONNECT
     connect_config:
       allow_post: true

I tried adding the headers on the request stream like this but internal listener doesn't see these:

request_headers_to_add:
- header:
     key: h1
     value: v1
   append_action: OVERWRITE_IF_EXISTS_OR_ADD
- header:
     key: h2
     value: v2
   append_action: OVERWRITE_IF_EXISTS_OR_ADD

kyessenov commented 4 weeks ago

@agrawroh I'm sorry, yes, I missed CONNECT termination. Then the path is to define a "hashable" filter state object. I can follow up with a first class support for this, or a custom object would also work.

agrawroh commented 3 weeks ago

@kyessenov I opened a PR following your suggestion to implement a new envoy.hashable_string. Could you PTAL?

kyessenov commented 3 weeks ago

I'm trying to parse the config and not seeing L7 pooling for internal connections. The first listener terminates the CONNECT, and then sends a raw TCP stream to the internal listener. That should mean each CONNECT stream is 1-1 with the internal connection, no? In that case, I don't understand where the sharing can come from.

agrawroh commented 3 weeks ago

I'm trying to parse the config and not seeing L7 pooling for internal connections. The first listener terminates the CONNECT, and then sends a raw TCP stream to the internal listener. That should mean each CONNECT stream is 1-1 with the internal connection, no? In that case, I don't understand where the sharing can come from.

@kyessenov The outer (downstream) listener accepts mixed traffic and based on the route matching rules, terminates the CONNECT and send the raw TCP requests to the internal listener for decapsulating. Internal listener should just be receiving only one type of traffic i.e. with raw TCP which needs decapsulating but given the behavior on the outer listener where it's receiving mixed traffic, is it possible that somehow the connections to internal listener are being shared?

I am not very familiar with this code but implementing the hashing solved the issue and we no longer see any state sharing.

wbpcode commented 3 weeks ago

I think we don't find the root cause. As @kyessenov said, a upgraded connection will never be shared. A possiblity is the clients didn't send requests with connect.

May a connect matcher could be used to ensure only connect requests will be routed to internal listener.

The hashable string may works around the issue, but it would be better to find the actual reason first.

agrawroh commented 3 weeks ago

I think we don't find the root cause. As @kyessenov said, a upgraded connection will never be shared. A possiblity is the clients didn't send requests with connect.

May a connect matcher could be used to ensure only connect requests will be routed to internal listener.

The hashable string may works around the issue, but it would be better to find the actual reason first.

In this case, the client is also an Envoy Proxy and is the only client which is sending requests to the Server (also an Envoy Proxy). Client Envoy is only sending the HTTP POST requests. I agree that we should keep this open and find the actual root-cause.

botengyao commented 2 weeks ago

This is because connections to the internal listener are pooled since the first listener is HTTP. So when you get two HTTP connections to the first listener, they share the connection upstream to the internal listener, and the first one to make that connection gets the filter state. Remember, the transfer of the data happens at TCP level, not HTTP. If you want HTTP level transfer, the best option is to inject a header, and then parse it in the internal listener. Another alternative, is to implement hash() method on the transferred filter state object that will create distinct connection pools to the internal listener based on the value of the hash. Unfortunately, there's no out-of-the-box way right now, envoy.string doesn't implement it, but I think we can fix it and add another envoy.hashable_string that supports a unique hash.

@kyessenov, related to HTTP data propagation, do you think we can implement a map in the filter state, like Map<request-id, metadata> from the previous HCM, and then propagate the filter state to the internal listener to be consumed? This needs to make all the filter state be shared.

envoyproxy / envoy