apollographql / router

A configurable, high-performance routing runtime for Apollo Federation 🚀
https://www.apollographql.com/docs/router/
Other
807 stars 272 forks source link

hyper's connection pool only opens one TCP connection for HTTP/2 #2063

Open ecourreges-orange opened 1 year ago

ecourreges-orange commented 1 year ago

Is your feature request related to a problem? Please describe. I am benching router with a backend that is a bit far on the network, so I need parallelism of calls to it and I noticed that the router only establishes one https connection to destination. A netstat -laputen easily shows that established client input connections on port 4000 scale to the number of simultaneous clients and backend connections to port 443 stay constantly exactly at 1, whether there is traffic or not.

Describe the solution you'd like Make it clear if backend connection parallelism is configurable or not.
If it's currently not, it needs to be coded, as the goal of coding it in Rust is to support higher throughput, even for requests to one backend.

I am looking into the code and learning a bit of Rust on the way ;-)

Thanks.

Geal commented 1 year ago

that is concerning, yes. A few questions:

so far we have used hyper's HTTP client, which has an automatic connection pool, which should have been able to detect the need for more connections. We are not really satisfied with the lack of control there though, and are considering taking over that task from the client

ecourreges-orange commented 1 year ago

that is concerning, yes. A few questions:

* do you see it slowing down concurrent requests in your benchmarks?

Indeed, after a client concurrence of 10+ there is no performance gain and the CPU is capped at 13% for 150 Req/s in my case, whereas with a home-made node.js, I reach >300 Req/s but the CPU is my bottleneck. Preliminary results show that the Rust router is 5-10x more CPU efficient than my Node.js version, but this connection bottleneck is unfortunately so far a showstopper.

* is the backend connection in HTTP 1 or HTTP/2? With HTTP/2 a lot of concurrent requests can be multiplexed over 1 connection

For now it's https 1. I will try to see if HTTP/2 is supported by my chain of go-wrk -> kong -> tomcat.

so far we have used hyper's HTTP client, which has an automatic connection pool, which should have been able to detect the need for more connections. We are not really satisfied with the lack of control there though, and are considering taking over that task from the client

Maybe you need to create the client with_pool_config and set the max idle connections to more than 0 or 1 which is the default? If it works, then a global env var associated to it would be great. It's worth a try and/or improve hyper before resorting to re-implementing. https://users.rust-lang.org/t/hyper-connection-pool-example/5315

Thanks!

Geal commented 1 year ago

Maybe you need to create the client with_pool_config and set the max idle connections to more than 0 or 1 which is the default? If it works, then a global env var associated to it would be great. It's worth a try and/or improve hyper before resorting to re-implementing. https://users.rust-lang.org/t/hyper-connection-pool-example/5315

This message is from 2016, that pool has since then been automatically integrated to hyper's HTTP client, and so this is the code that's apparently not parallelizing the requests. I'll look into why it's not working for now

Geal commented 1 year ago

@ecourreges-orange we looked into it and the router is using the connection pool, starting multiple backend connections when there are concurrent clients. It is not a 1:1 correspondance (ex: I saw 412 backend connections for 1000 clients in one test) but it should be enough to get some parallelism.

Could you describe a bit more your setup?

ecourreges-orange commented 1 year ago

@ecourreges-orange we looked into it and the router is using the connection pool, starting multiple backend connections when there are concurrent clients. It is not a 1:1 correspondance (ex: I saw 412 backend connections for 1000 clients in one test) but it should be enough to get some parallelism.

Could you describe a bit more your setup?

* what are you using as client, and for the nodejs alternative (is it a gateway or a handrolled http proxy)
  • The client is go-wrk with concurrency 1 to 80.
  • The comparison is with a Node.js gateway with code assembled by us from these examples

    • are you doing multiple times the same query, or different queries on each request?

Always the same query at each request

* the output of your `netstat` command. Alternatively, you can use `ss -pot state established | grep router | grep "<ip:port of the subgraph>"` to see all of the established connections

Only one connection established at all times:

~/workspace/poc_nboa/Client_Simulator/go-wrk  (graphql)$ ss -pot state established | grep 10.118.44.10
0        0                   10.8.0.2:44108        10.118.44.10:https            users:(("router",pid=53378,fd=34)) timer:(keepalive,29sec,0)          
* what is the latency of the subgraph

Around 50ms because it is at least 35ms away on the network.

Concurrency 10:

~/workspace/poc_nboa/Client_Simulator/go-wrk  (graphql)$ ./go-wrk -no-vr -M POST -H="Authorization: Bearer $access_token" -H='Accept: application/json' -H='Content-Type: application/json' -body='{ "query": "query { fromCID(cid:\"9042393705\"){ score{FDDFQN{rank score}} }}"}' http://localhost:4000/graphql -d 10
Running 10s test @ http://localhost:4000/graphql
  10 goroutine(s) running concurrently
1432 requests in 10.036281715s, 183.20KB read
Requests/sec:       142.68
Transfer/sec:       18.25KB
Avg Req Time:       70.085766ms
Fastest Request:    49.043791ms
Slowest Request:    480.874469ms
Number of Errors:   0
Number of Connect timeouts: 0

Concurrency 40:

~/workspace/poc_nboa/Client_Simulator/go-wrk  (graphql)$ ./go-wrk -c 40 -no-vr -M POST -H="Authorization: Bearer $access_token" -H='Accept: application/json' -H='Content-Type: application/json' -body='{ "query": "query { fromCID(cid:\"9042393705\"){ score{FDDFQN{rank score}} }}"}' http://localhost:4000/graphql -d 10
Running 10s test @ http://localhost:4000/graphql
  40 goroutine(s) running concurrently
1672 requests in 10.112006354s, 213.90KB read
Requests/sec:       165.35
Transfer/sec:       21.15KB
Avg Req Time:       241.914027ms
Fastest Request:    187.099328ms
Slowest Request:    581.516919ms
Number of Errors:   0
Number of Connect timeouts: 0

Directly on the subgraph with concurrency 40 the go-wrk tool does generate many simultaneous connections:

/workspace/poc_nboa/Client_Simulator/go-wrk  (graphql)$ ss -pot state established | grep https
0        0                   10.8.0.2:55628        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=10)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55602        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=8)) timer:(keepalive,12sec,0)                    
0        0                   10.8.0.2:55654        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=14)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55748        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=25)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55768        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=29)) timer:(keepalive,12sec,0)                   
0        939                 10.8.0.2:55646        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=12)) timer:(on,204ms,0)                          
0        0                   10.8.0.2:55812        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=36)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55786        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=32)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55720        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=21)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55750        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=27)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55824        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=37)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55856        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=40)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55800        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=34)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55766        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=28)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55676        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=17)) timer:(keepalive,12sec,0)                   
0        0                   10.8.0.2:55862        10.117.28.10:https            users:(("go-wrk",pid=56079,fd=43)) timer:(keepalive,12sec,0)                   
...

And it goes faster:

Running 10s test @ https://poc-nboa.rec.api.hbx.geo.infra.ftgroup/valkeygg/v1/graphql
  40 goroutine(s) running concurrently
3907 requests in 10.040215678s, 3.68MB read
Requests/sec:       389.14
Transfer/sec:       375.79KB
Avg Req Time:       102.792072ms
Fastest Request:    47.581664ms
Slowest Request:    510.22633ms
Number of Errors:   0
Number of Connect timeouts: 0
* is the router compiled in debug or release mode

release docker image FROM ghcr.io/apollographql/router:v1.2.1 as router And also from a cargo run of the latest version I get the same behavior of only one connection

Note that there are no errors in the console of router:

~/workspace/router  (dev)$ cargo run --  --supergraph=../graphqlrouter/fedefilm.graphql --config=../graphqlrouter/router.yaml
    Finished dev [unoptimized + debuginfo] target(s) in 0.40s
     Running `target/debug/router --supergraph=../graphqlrouter/fedefilm.graphql --config=../graphqlrouter/router.yaml`
2022-11-24T15:13:19.340520Z  INFO Apollo Router v1.4.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)
2022-11-24T15:13:19.340631Z  INFO Anonymous usage data is gathered to inform Apollo product development.  See https://go.apollo.dev/o/privacy for more info.
2022-11-24T15:13:19.784616Z  INFO healthcheck endpoint exposed at http://127.0.0.1:8088/health
2022-11-24T15:13:19.785463Z  INFO GraphQL endpoint exposed at http://0.0.0.0:4000/graphql 🚀

If my Access Token expires, then I get errors for the subgraph on the router console.

I tried this in the router code to no avail:

--- a/apollo-router/src/services/subgraph_service.rs
+++ b/apollo-router/src/services/subgraph_service.rs
@@ -84,7 +84,7 @@ impl SubgraphService {
         Self {
             client: ServiceBuilder::new()
                 .layer(DecompressionLayer::new())
-                .service(hyper::Client::builder().build(connector)),
+                .service(hyper::Client::builder().pool_max_idle_per_host(10).build(connector)),
             service: Arc::new(service.into()),
         }
     }

My router config:

---
supergraph:
  listen: 0.0.0.0:4000
  path: /graphql
headers:
  all:
    request:
      - propagate:
          matching: ^authorization.*
include_subgraph_errors:
  all: true
  subgraphs:
    allofilm: true
    consofilm: true
telemetry:
  tracing:
    trace_config:
      service_name: "router"
      service_namespace: "apollo"
      # Optional. Either a float between 0 and 1 or 'always_on' or 'always_off'
      sampler: 1.0
      # Optional. Use a parent based sampler. This enables remote spans help make a decision on if a span is sampeld or not.
      # https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#parentbased
      parent_based_sampler: true

      # Optional limits
      max_attributes_per_event: 10
      max_attributes_per_link: 10
      max_attributes_per_span: 10
      max_events_per_span: 10
      max_links_per_span: 10
    propagation:
      trace_context: true
    zipkin:
      endpoint: http://mycollector.mydomain.com:9411/api/v2/spans

Response from the subgraph:

{"data":{"fromCID":{"score":{"FDDFQN":null}}},"extensions":{"tracing":{"version":1,"startTime":"2022-11-24T15:27:28.214377Z","endTime":"2022-11-24T15:27:28.252514Z","duration":38140752,"parsing":{"startOffset":116378,"duration":102577},"validation":{"startOffset":241902,"duration":120115},"execution":{"resolvers":[{"path":["fromCID"],"parentType":"Query","returnType":"CID","fieldName":"fromCID","startOffset":303700,"duration":427734},{"path":["fromCID","score"],"parentType":"CID","returnType":"Score","fieldName":"score","startOffset":762149,"duration":36637789},{"path":["fromCID","score","FDDFQN"],"parentType":"Score","returnType":"Scoring","fieldName":"FDDFQN","startOffset":37456428,"duration":622952}]}}}}

Response from the router:

{"data":{"fromCID":{"score":{"FDDFQN":null}}}}

I will try disabling telemetry to see if there is any improvement.

Geal commented 1 year ago

is the connection between the router and subgraph in HTTP 1 or 2?

ecourreges-orange commented 1 year ago

go-wrk to router is http/2 or http/1 and router to subgraph is http/1. The subgraph is not configured for http/2, and not sure that our gateway would enable it. Same results whatever the setting on go-wrk.

In debug logging router seems to always be reusing the idle connection. {"timestamp":"2022-11-24T15:59:51.890134Z","level":"DEBUG","message":"reuse idle connection for (\"https\", myserver.mydomain.com)","target":"hyper::client::pool","filename":"/home/ecourreges/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/pool.rs","line_number":250,"span":{"name":"subgraph_request"},"spans":[{"trace_id":"007e38be9468c1e59601b198694f3c29","name":"request"},{"name":"supergraph"},{"name":"execution"},{"name":"fetch"},{"name":"subgraph"},{"name":"subgraph_request"}]}

Geal commented 1 year ago

I see :thinking: a few more questions, if you don't mind:

AFAIK the hyper options like pool_max_idle_per_host won't be of any help here. I'll have to dig deeper in the connection pool and see what happens. Thankfully, it looks like we'll soon get more control on the client behaviour https://seanmonstar.com/post/701008919383932928/hyper-polish-period

Geal commented 1 year ago

Also, could you try to remove this line ?https://github.com/apollographql/router/blob/f6e0ef8563161b79e4430a860dc18cba1e407798/apollo-router/src/services/subgraph_service.rs#L83

The idea I want to check here, is that hyper's connection pool has a special case for HTTP/2: it will not start multiple concurrent connections if there's already one in the connecting state. And if a connection is already established, there will be a race between getting a connection from the pool and establishing a new one. It might be that with the latency, establishing a new TLS connection (3 full roundtrips) would always be slower than getting a connection from the pool

ecourreges-orange commented 1 year ago

Wow! Great job, that line did the trick! So in http/1 there is no possible race and it parallelizes properly. I guess you now have an issue to file to hyper ;-) Thanks a lot! I had disabled telemetry with no improvement, and now reactivated it with no problems either. I will now be able to resume some testing with a release compiled version, knowing that there might still be an improvement when http/2 can be re-enabled and the race avoided.

Running 10s test @ http://localhost:4000/graphql
  40 goroutine(s) running concurrently
3049 requests in 10.064905637s, 390.06KB read
Requests/sec:       302.93
Transfer/sec:       38.75KB
Avg Req Time:       132.042054ms
Fastest Request:    51.185108ms
Slowest Request:    409.331539ms
Number of Errors:   0
Number of Connect timeouts: 0
~/workspace/poc_nboa/Client_Simulator/go-wrk  (graphql)$ ss -pot state established | grep https
0         0               192.168.17.128:51972           10.117.30.10:https      users:(("router",pid=60815,fd=103)) timer:(keepalive,58sec,0)                  
0         0               192.168.17.128:52196           10.117.30.10:https      users:(("router",pid=60815,fd=88)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52310           10.117.30.10:https      users:(("router",pid=60815,fd=65)) timer:(keepalive,59sec,0)                   
0         0               192.168.17.128:52432           10.117.30.10:https      users:(("router",pid=60815,fd=74)) timer:(keepalive,1min,0)                    
0         0               192.168.17.128:52286           10.117.30.10:https      users:(("router",pid=60815,fd=25)) timer:(keepalive,59sec,0)                   
0         0               192.168.17.128:51938           10.117.30.10:https      users:(("router",pid=60815,fd=97)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:51822           10.117.30.10:https      users:(("router",pid=60815,fd=27)) timer:(keepalive,58sec,0)                   
718       0               192.168.17.128:52080           10.117.30.10:https      users:(("router",pid=60815,fd=66)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52390           10.117.30.10:https      users:(("router",pid=60815,fd=29)) timer:(keepalive,59sec,0)                   
0         0               192.168.17.128:52422           10.117.30.10:https      users:(("router",pid=60815,fd=71)) timer:(keepalive,1min,0)                    
0         0               192.168.17.128:52374           10.117.30.10:https      users:(("router",pid=60815,fd=75)) timer:(keepalive,59sec,0)                   
0         0               192.168.17.128:52164           10.117.30.10:https      users:(("router",pid=60815,fd=81)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52262           10.117.30.10:https      users:(("router",pid=60815,fd=95)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52030           10.117.30.10:https      users:(("router",pid=60815,fd=80)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52116           10.117.30.10:https      users:(("router",pid=60815,fd=77)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52012           10.117.30.10:https      users:(("router",pid=60815,fd=78)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52184           10.117.30.10:https      users:(("router",pid=60815,fd=85)) timer:(keepalive,58sec,0)                   
0         0               192.168.17.128:52272           10.117.30.10:https      users:(("router",pid=60815,fd=99)) timer:(keepalive,58sec,0)    
Geal commented 1 year ago

awesome! I think I'll add an option to toggle http/2 support while we're investigating hyper

Geal commented 1 year ago

I have difficulties reproducing it, even by introducing delays in the connector and on packet reads.

Are you absolutely sure the subgraph connection is not done in HTTP/2? I'm asking because with a 50ms latency, on one connection we would only be able to do 20RPS, and here it's around 150RPS, so this feels like there's either pipelining (but I'm not sure hyper's client supports it) or HTTP/2 multiplexing.

you can look at the version by printing response.version() from https://github.com/apollographql/router/blob/f6e0ef8563161b79e4430a860dc18cba1e407798/apollo-router/src/services/subgraph_service.rs#L171

ecourreges-orange commented 1 year ago

You are right, especially If disabling HTTP/2 fixes the problem obviously, I should have corrected my previous message, the requests to the subgraph were in HTTP/2, but what got me confused is that I have 3 legs:

Geal commented 1 year ago

alright, that makes more sense then. And that matches an issue I've seen elsewhere, that if in theory a HTTP/2 session can handle a lot f concurrent stream and get all the available bandwidth, since it is TCP under the hood, any delay or packet retransmission would affect all the streams. In those cases the solution would be to open multiple connections with HTTP/2 (which hyper's pool disallow for now), move back to HTTP 1 with one connection per concurrent request, or move to HTTP 3, which does not suffer from the same retransmission issues

akhilesh-godi commented 1 year ago

Bumping this up as this is a problem for us too and we'd been forced to disable http2 for the same reasons as described in this ticket for quite some time now.

Related: https://github.com/apollographql/router/issues/1956

marc-barry commented 1 year ago

Bumping this up as this is a problem for us too and we'd been forced to disable http2 for the same reasons as described in this ticket for quite some time now.

@akhilesh-godi How did you disable HTTP/2 to the subgraphs in the router? Is there a configuration setting?

marc-barry commented 1 year ago

Unfortunately, it looks like Apollo Router has no way to disable HTTP/2 on the client side. The only option I can think of is to make the subgraph endpoints insecure with an http scheme and hope that Apollo Router's client library doesn't attempt http/2 over an insecure connection. It shouldn't, but we'll see.

ecourreges-orange commented 1 year ago

Unfortunately, it looks like Apollo Router has no way to disable HTTP/2 on the client side. The only option I can think of is to make the subgraph endpoints insecure with an http scheme and hope that Apollo Router's client library doesn't attempt http/2 over an insecure connection. It shouldn't, but we'll see.

As you can see from the comments above, you can fix this yourself if you are willing to recompile, having disabled this line: https://github.com/apollographql/router/blob/dev/apollo-router/src/services/subgraph_service.rs#L135

I believe @Geal should commit with this line disabled or configurable with a default to http1 until the issue with the underlying lib is fixed. As it is a one liner, and the solution has been found by @Geal, it is not my role to submit a PR.

Geal commented 1 year ago

instead of disabling it entirely, we will look at ways to balance subgraph calls

marc-barry commented 1 year ago

I have confirmed with certainty that the Apollo Router HTTP/2 to subgraph implementation isn't working optimally. We recently transitioned from Apollo Server to Apollo Router and our metrics showed a regression in latency. It surprised us and so I set out to determine why. Upon finding this issue, I looked at our configuration and network connections out of Apollo Router instances. Indeed, there was only one, regardless of the number of requests handled. Our subgraphs are all endpoints on Google Cloud internal load balancers and there is no way to disable HTTP/2 on them. The documentation states that this must be disabled on the client side (which makes sense). Instead of abandoning Apollo Router, I decided to force HTTP/1.1 but simply making the subgraph endpoints insecure with http and hoped that the library Router was using would adhere to HTTP/2 being only available over https. I can confirm that over http (insecure) indeed HTTP/2 isn't used. When I did this the performance issues went away.

Our setup is the following:

There is something unique about our system in that we have GraphQL requests with high latency variation from ms to greater than 60s. This high variance is what probably causes us to see this performance issue more than others. In order to replicate the behaviour I suggest you use https://github.com/Shopify/toxiproxy in front of the subgraph endpoints and manipulate the latency (i.e. https://github.com/Shopify/toxiproxy#latency).

We are deciding how to proceed. But one option at this time is we fork Apollo Router and disable HTTP/2 in the source code and build a custom version for our use case. We aren't Rust developers but there appears to be a single location that needs to be updated to remove HTTP/2 support.

tmCision commented 1 year ago

Bumping this as we are also seeing this issue.
I know it is not ideal, but considering that this is a fairly fundamental issue (in my opinion the router cannot be used at scale as is) and is a few months old with a known workaround, wouldn't it make sense to temporarily allow http2 to be disabled by some configuration value instead of having a bunch of people make their own forks?

Geal commented 1 year ago

this is my next task so the fix should come soon. I'll add an option for that along with looking into better load balancing

Geal commented 1 year ago

there is now an option to deactivate HTTP/2. I'm reopening this issue though, as I'm still looking for a proper fix

akhilesh-godi commented 1 year ago

@marc-barry apologies for the delay - we had forked and compiled it ourself after disabling http2.

sean-clarke commented 1 year ago

there is now an option to deactivate HTTP/2. I'm reopening this issue though, as I'm still looking for a proper fix

@Geal The linked MR has "experimental_enable_http2". Might be a wording confusion but is http2 disabled by default, or do you need to add this experimental_enable_http2 config true or false to disable http2?

ntkoopman commented 1 year ago

You have to set it to false to disable http2, it's enabled by default

Geal commented 1 year ago

It's enabled by default to keep the same default behaviour. Suddenly deactivating http/2 for all users wouldn't work well, and the intent here is to fix it properly soon

bashiru98 commented 1 year ago

@Geal if the router is communicating with the subgraphs using only http(insecure) connection is there need to set experimental_enable_http2 to false.

Geal commented 1 year ago

@bashiru98 you should not need it in that case

pp05 commented 1 year ago

@Geal I looked up at the documentation of Apollo router and could not locate any on how to use the new disable option that you have introduced in https://github.com/apollographql/router/pull/2621 , I am new to Rust and to ApolloRouter so could also be a novice question

bnjjj commented 1 year ago

@pp05

traffic_shaping:
  all:
    experimental_enable_http2: false # to disable for all subgraphs
  subgraphs: # Rules applied to requests from the router to individual subgraphs
    products:
      experimental_enable_http2: false # only for "products" subgraph
pp05 commented 1 year ago

Thank you

@pp05

traffic_shaping:
  all:
    experimental_enable_http2: false # to disable for all subgraphs
  subgraphs: # Rules applied to requests from the router to individual subgraphs
    products:
      experimental_enable_http2: false # only for "products" subgraph
pp05 commented 1 year ago

@Geal I have been tracking this for a while now, just wanted to check if you have managed to find a workaround to allow enabling of http/2 connections ? Is there an issue logged with hyper ? I also found this PR from yourself to use tower as an alternative library, has this gone any further into testing/peformance testing etc.

sclarke-rivian commented 12 months ago

@Geal it appears that the experimental_enable_http2 options has been removed and now throws an error in configuration as of the more recent apollo-router releases? Is the issue here fixed, or has this been moved? I still see the option in https://github.com/apollographql/router/blob/d506c4b1a6629c692dacdc86614441e9c77e03ec/docs/shared/router-common-config.mdx#L29 and there appears to have been no mention in the release notes?

Geal commented 12 months ago

the option has been moved in https://github.com/apollographql/router/blob/dev/CHANGELOG.md#http2-cleartext-protocol-h2c-support-for-subgraph-connections-issue-3535

from

experimental_enable_http2: false

to:

experimental_http2: disable # possible options: enable, disable, http2only

The configuration upgrade mechanism was supposed to call it out, but there were some issues in the way it could rewrite an option of that shape.

For a proper fix of this issue, we have to wait for the 1.0 release of the hyper HTTP client (which is in 1.0.0-rc.4 right now)