envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.92k stars 4.8k forks source link

HTTP2 connection pool doesn't work with HTTP CONNECT #25504

Closed mamazik closed 1 year ago

mamazik commented 1 year ago

Description: I'm using Envoy v1.25.0 as dynamic HTTP forward proxy with connect_matcher. For some reason in this configuration it refuses to utilize HTTP2 connection pool feature, resulting in every single request opening a brand new TCP/IP connection.

I specifically mentioned that it doesn't work with HTTP CONNECT because I noticed that if I do the tests below with HTTP instead of HTTPS (which doesn't utilize HTTP CONNECT), then connection pool works perfectly fine.

This bug item is basically a copy of the question 24702 that I raised a few months back (apologies for being too noisy about this, but I would really love to get to the bottom of the issue!). I opened the new item because I'm almost positive that the issue that it's either a bug, or some undocumented limitation, or perhaps even the result of a misconfiguration (one that I can't seem to spot) of sorts.

P.S. I also tried to configure it as SNI dynamic forward proxy, but that didn't help.

Repro steps:

  1. Run Envoy with the configuration below;

  2. Do an HTTP2 HTTPS requests through Envoy, example with cURL:

    curl -Lvx localhost:10000 https://www.microsoft.com
  3. Run a command to list outgoing connections, example with ss:

    ss -pi | grep -i envoy

    Result: the TCP/IP connection created by the request will be closed almost immediately once the request has been completed. Expectations: the TCP/IP connection created by the request will remain active to be re-used.

  4. Do an HTTP2 HTTP request through Envoy, example with cURL:

    curl -Lvx localhost:10000 www.microsoft.com
  5. Run a command to list outgoing connections, example with ss:

    ss -pi | grep -i envoy

    Result: the TCP/IP connection created by the request will remain active to be re-used.

Config:

static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: TCP
          address: 127.0.0.1
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                http_protocol_options:
                  accept_http_10: true
                access_log:
                - name: envoy.file_access_log
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                    path: "/var/log/envoy/envoy_access.log"
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: dynamic_forward_proxy_cluster
                            timeout: 0s
                        - match:
                            connect_matcher:
                              {}
                          route:
                            cluster: dynamic_forward_proxy_cluster
                            timeout: 0s
                            upgrade_configs:
                              - upgrade_type: CONNECT
                                connect_config:
                                  {}
                http_filters:
                  - name: envoy.filters.http.dynamic_forward_proxy
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.dynamic_forward_proxy.v3.FilterConfig
                      dns_cache_config:
                        name: dynamic_forward_proxy_cache_config
                        dns_lookup_family: V4_ONLY
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
    - name: dynamic_forward_proxy_cluster
      connect_timeout: 60s
      lb_policy: CLUSTER_PROVIDED
      cluster_type:
        name: envoy.clusters.dynamic_forward_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
          dns_cache_config:
            name: dynamic_forward_proxy_cache_config
            dns_lookup_family: V4_ONLY
admin:
  access_log_path: /var/log/envoy/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901
pxpnetworks commented 1 year ago

shouldn't you have to have this part in your dynamic_forward_proxy cluster config?

typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http2_protocol_options: {}
mamazik commented 1 year ago

type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions

To be honest, I'm not sure. My configuration is just a (slightly modified) copy of the one from the docs. And other than connection pool, everything seems to be working perfectly.

From what's written here I also get a feeling that the whole idea of envoy.extensions.upstreams.http.v3.HttpProtocolOptions is simply to use that thing to declare cluster-related options under it, as opposed to under the cluster definition itself, if that makes sense.

wbpcode commented 1 year ago

cc @alyssawilk

alyssawilk commented 1 year ago

I agree with @pxpnetworks Your current configuration has no protocol configuration for your cluster, and I believe Envoy will default to HTTP/1.1. With HTTP/1.1 connect TCP connections can not be reused and the behavior you describe is expected.

mamazik commented 1 year ago

I agree with @pxpnetworks Your current configuration has no protocol configuration for your cluster, and I believe Envoy will default to HTTP/1.1.

I added http2_protocol_options: {} to my cluster config as per your suggestion (which I really hope I didn't misunderstand!), but that didn't change the behavior. Below is the new config for my cluster after the changes:

  clusters:
    - name: dynamic_forward_proxy_cluster
      connect_timeout: 60s
      http2_protocol_options:
        {}
      lb_policy: CLUSTER_PROVIDED
      cluster_type:
        name: envoy.clusters.dynamic_forward_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
          dns_cache_config:
            name: dynamic_forward_proxy_cache_config
            dns_lookup_family: V4_ONLY

With HTTP/1.1 connect TCP connections can not be reused and the behavior you describe is expected.

Perhaps I misunderstand something, but in my understanding if we talk about forward proxy, the only thing it needs to do is it needs to establish connection with the host via HTTP CONNECT method. If you look at this part in cURL output, it always contains HTTP/1.1 in it. For example:

> CONNECT www.microsoft.com:443 HTTP/1.1
> Host: www.microsoft.com:443
> User-Agent: curl/7.87.0

I'm not an expert in HTTP, but I tried to google HTTP CONNECT over HTTP/2, and such thing doesn't seem to exist, so I assume this is not what you meant by saying "Envoy will default to HTTP/1.1".

Once the connection to the host has been established, client starts connecting to the host via that connection, and from what I understand, it's up to the client to decide whether to use HTTP/1.1 or HTTP/2. For example, in my infra I use the latest version of cURL, which defaults to HTTP/2, so this is what I see in the output if I do a request to https://www.microsoft.com:

> GET / HTTP/2
> Host: www.microsoft.com
> user-agent: curl/7.87.0

Nevertheless, I can easily change that to HTTP/1.1 by using --http1.1 flag in cURL. If I do that, the output will contain the following:

> GET / HTTP/1.1
> Host: www.microsoft.com
> User-Agent: curl/7.87.0

I know it's been a long post already, but please allow me to post a few examples to better illustrate the current behavior.

HTTP/1.1 request to www.microsoft.com over HTTP

(Before doing this test I had to comment out http2_protocol_options: {}, because it didn't allow me to make an HTTP/1.1 request over HTTP)

I'm opening 2 SSH sessions to Envoy proxy. In one of them I'm going to be running a command to check existing TCP/IP connections (as well as printing the current time for better understanding of the situation). In another one I'm going to be making an HTTP/1.1 request to www.microsoft.com over HTTP to avoid using HTTP CONNECT.

Starting the loop to check existing connections in the first terminal:

for i in {1..1000}; do date; ss -pi | grep -i envoy; done

Running the request:

curl -Lvx localhost:10000 http://www.microsoft.com

cURL output proving it's indeed HTTP/1.1:

> GET http://www.microsoft.com/ HTTP/1.1
> Host: www.microsoft.com
> User-Agent: curl/7.87.0

First occurrence of the connection in the output of ss command:

Wed 15 Feb 08:50:33 EET 2023
tcp    ESTAB      0      0      192.168.100.1:33722                2.16.229.138:http                  users:(("envoy",pid=3213,fd=29))

Even 20 seconds later, the connection is still there:

Wed 15 Feb 08:50:53 EET 2023
tcp    ESTAB      0      0      192.168.100.1:33722                2.16.229.138:http                  users:(("envoy",pid=3213,fd=29))

Result: when I make a new request, I'm not seeing a new TCP/IP connection, so the way I see it is connection pool works perfectly fine with HTTP/1.1.

HTTP/2 request to www.microsoft.com over HTTPS

(Before doing this test I brought the option http2_protocol_options: {} back into the config)

The setup is the same, but here I'm going to be making an HTTP/2 request to www.microsoft.com over HTTPS with use of HTTP CONNECT.

Starting the loop to check existing connections in the first terminal:

for i in {1..1000}; do date; ss -pi | grep -i envoy; done

Running the request:

curl -Lvx localhost:10000 https://www.microsoft.com

cURL output proving it's indeed HTTP/2:

> GET / HTTP/2
> Host: www.microsoft.com
> user-agent: curl/7.87.0

First occurrence of the connection in the output of ss command:

Wed 15 Feb 08:58:43 EET 2023
tcp    ESTAB      0      0      127.0.0.1:ndmp                 127.0.0.1:57786                 users:(("envoy",pid=16934,fd=28))
tcp    ESTAB      0      0      192.168.100.1:50680                2.20.149.232:https                 users:(("envoy",pid=16934,fd=29))

Last occurrence of the connection in the output:

Wed 15 Feb 08:58:45 EET 2023
tcp    ESTAB      0      0      127.0.0.1:ndmp                 127.0.0.1:57786                 users:(("envoy",pid=16934,fd=28))
tcp    ESTAB      0      0      192.168.100.1:50680                2.20.149.232:https                 users:(("envoy",pid=16934,fd=29))

When I make a new request, I'm seeing a brand new TCP/IP connection:

Wed 15 Feb 08:59:30 EET 2023
tcp    ESTAB      0      0      127.0.0.1:ndmp                 127.0.0.1:58530                 users:(("envoy",pid=16934,fd=28))
tcp    ESTAB      0      0      192.168.100.1:50666                2.16.229.138:https                 users:(("envoy",pid=16934,fd=29))

Result: here the connection lasted only for the duration of the request, and a new request created a brand new connection. Compared to the first test, it seems that connection pool does not function properly in this case.

pxpnetworks commented 1 year ago

@mamazik Sorry for not being more clear the protocol options config should go like this - the whole cluster config below:

    clusters:
    - name: dynamic_forward_proxy_cluster
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: {}
      connect_timeout: 60s
      lb_policy: CLUSTER_PROVIDED
      cluster_type:
        name: envoy.clusters.dynamic_forward_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
          dns_cache_config:
            name: dynamic_forward_proxy_cache_config
            dns_lookup_family: V4_ONLY
mamazik commented 1 year ago

@mamazik Sorry for not being more clear the protocol options config should go like this - the whole cluster config below:

No problem at all, and many thanks for your suggestions, I really appreciate the effort!

I actually tried to configure it the way you described yesterday, but I couldn't figure out why Envoy service would not start after that. I tried it again now, and noticed it produces the following error:

Feb 15 09:46:44 EDITED error initializing configuration EDITED cluster must have auto_sni and auto_san_validation true unless allow_insecure_cluster_options is set.

So I changed the configuration to the following:

  clusters:
    - name: dynamic_forward_proxy_cluster
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          upstream_http_protocol_options:
            auto_sni: true
            auto_san_validation: true
          explicit_http_config:
            http2_protocol_options: {}
      connect_timeout: 60s
      lb_policy: CLUSTER_PROVIDED
      cluster_type:
        name: envoy.clusters.dynamic_forward_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
          dns_cache_config:
            name: dynamic_forward_pr

Unfortunately, this did not change the results of my tests above. HTTP/2 connections over HTTPS with HTTP CONNECT are still lingering only for the duration of the corresponding requests, and every new request continues to establish a brand new connection.

pxpnetworks commented 1 year ago

I've seen the error and i personally mitigated it with allow_insecure_cluster_options in the dynamic_forward_proxy_config. This is my working config however after some tests it also doesn't produce tcp connection reuse. I wonder if upstream H2 can be really reproduced with curl at all.. when i use the --http2 option the upstream responds with < HTTP/2 200 however access log:

 "upstream_proto": null,
  "response_code": 200,

and stats: envoy_cluster_upstream_cx_http2_total{envoy_cluster_name="dynamic_forward_proxy_cluster"} 0

what is odd is that if I try h2 against a local influxdb instance it actually does try explicit h2 against it: curl http://10.55.219.211:8086/ping -v --http1.1 -x http://10.155.15.235:3129 < HTTP/1.1 502 Bad Gateway [2023-02-15 08:30:18.218][202689][debug][http2] [source/common/http/http2/codec_impl.cc:1375] [C5] invalid http2: Remote peer returned unexpected data while we expected SETTINGS frame. Perhaps, peer does not support HTTP/2 properly. access log:

  "response_code": 502,
  "upstream_proto": "HTTP/2",

stats: envoy_cluster_upstream_cx_http2_total{envoy_cluster_name="dynamic_forward_proxy_cluster"} 7

In conclusion it seems when using CONNECT to proxy HTTPS traffic it doesnt do H2 with the upstream. I was left with the impression it was working because I had persistent connections but because of this config which i also have:

preconnect_policy:
    per_upstream_preconnect_ratio: 2

my config:

- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
  name: dynamic_forward_proxy_cluster
  connect_timeout: 5s
  typed_extension_protocol_options:
    envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
      "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
      explicit_http_config:
        http2_protocol_options: {}
  lb_policy: CLUSTER_PROVIDED
  cluster_type:
    name: envoy.clusters.dynamic_forward_proxy
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
      allow_insecure_cluster_options: true
      dns_cache_config:
        name: dynamic_forward_proxy_cache_config
        dns_lookup_family: V4_ONLY
  preconnect_policy: # additional config not relevant to h2
    per_upstream_preconnect_ratio: 2 # additional config not relevant to h2
mamazik commented 1 year ago

In conclusion it seems when using CONNECT to proxy HTTPS traffic it doesnt do H2 with the upstream.

Thank you for sharing the details! Glad to hear I'm not the only one who's facing the issue.

In my case I'm seeing increase in these 2 counters every time I make an HTTP/2 request to https://www.microsoft.com with HTTP CONNECT:

http.ingress_http.downstream_cx_http1_total: 5
http.ingress_http.downstream_rq_http1_total: 5

Whilst these 2 remain at 0:

http.ingress_http.downstream_cx_http2_total: 0
http.ingress_http.downstream_rq_http2_total: 0

These 2 also remain at 0, but perhaps we're using it differently:

cluster.dynamic_forward_proxy_cluster.upstream_cx_http1_total: 0
cluster.dynamic_forward_proxy_cluster.upstream_cx_http2_total: 0

However, my tests show that without HTTP CONNECT even HTTP/1.1 requests can produce a persistent connection, so to me it seems that no matter if HTTP/1.1 or HTTP/2 is used with HTTP CONNECT, normally persistent connection can still be created.

mamazik commented 1 year ago

Hello,

@alyssawilk can you provide more suggestions please? Our latest findings with @pxpnetworks show that H2 is not used even when all necessary bits of Envoy configuration (like cluster's http2_protocol_options: {}) are in place, which's strange. I can't find any indication of what else might be wrong with my configuration.

pxpnetworks commented 1 year ago

Interesting indeed, not sure if related or not but also noticed when

typed_extension_protocol_options:
    envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
      "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
      explicit_http_config:
        http2_protocol_options: {}

is attached to an upstream cluster for TcpProxy +TunnelingConfig there are a lot of upstream active connections image

I thought the upstream connection between the tcp encap envoy (A) and the upstream tcp decap envoy (B) would be one and all downstream tcp proxy connections on A would be multiplexed into the H2 stream but that seems not working based on that cx_active metric ..

BR, Stoyan

alyssawilk commented 1 year ago

I'm not sure what's still wrong with the config but we have e2e tests of Envoy doing H2 connect upstream so maybe take a look at the configuration for those, or for the example configs, and see how yours differs.

mamazik commented 1 year ago

@alyssawilk can you please tell us more about the configs you refer to? Where we can find them (especially E2E)?

In the documentation, when I searched for forward proxy configs, I could only find this one and this one. None of them have http2_protocol_options, by the way.

kyessenov commented 1 year ago

https://github.com/envoyproxy/envoy/blob/main/configs/encapsulate_in_http2_connect.yaml

mamazik commented 1 year ago

@kyessenov thank you!

I didn't find any (notable and relevant) difference between my config and the one you provided, or any other one in the same directory.

I believe there might be an issue with this specific pair: dynamic HTTP forward proxy + HTTP CONNECT (it's also quite suspicious that 2 different people have the same issue). From what I can see you currently don't have any tests for it. Is it possible to test it?

pxpnetworks commented 1 year ago

Hi guys, i might be wrong but mamazik's case is for terminating the CONNECT from downstream directly in the HCM and its upstream is a DFPC cluster with H2 settings. However i think the config encapsulate_in_http2_connect.yml needs a separate upstream HCM provided by decapsulate_http2_connect.yml to terminate the CONNECT and most likely the connection in this case will be indeed H2. In the first case with mamazik's original config only the TCP payload is transmitted and only a TCP connection is established with the Upstream eg. microsoft.com

kyessenov commented 1 year ago

@yanjunxiang-google this might be something you tested.

yanjunxiang-google commented 1 year ago

IIUC, the behavior is by design.

The curl to HTTPS website will trigger the client sending a CONNECT message to Envoy which terminates CONNECT there. Envoy will start DNS resolving for the CONNECT target then create a DFP cluster after DNS resolving succeeds. It then establishes a TCP connection to the upstream. The follow up GET request will be treated as data and being TCP proxied to the upstream. This TCP connection can not be re-used and will be disconnected after the request completed. There is an Envoy test covers this: https://github.com/envoyproxy/envoy/blob/1ba94c89f9cd0a05a5f50cf031c43f2634d72fc6/test/extensions/filters/http/dynamic_forward_proxy/proxy_filter_integration_test.cc#L660

This is different from curl to HTTP website, in which a H2 connection will be created between Envoy and upstream and can be re-used.

Adding @yanavlasov to this discussion.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

mamazik commented 1 year ago

As explained by @yanjunxiang-google, proxy establishes a TCP connection with the upstream, and that connection (probably) cannot be re-used. The only explanation as to why that might be the case that I've found can be found here. Thanks you all for your valuable contribution!