cloudflare / cloudflared

Cloudflare Tunnel client (formerly Argo Tunnel)
https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/tunnel-guide
Apache License 2.0
8.83k stars 778 forks source link

🐛 Quic is has slower response times than http2 protocol (confirmed issue with cloudflare support) #895

Open vazexqi opened 1 year ago

vazexqi commented 1 year ago

Describe the bug

quic is default for cloudflared but it has performance issues as observed in our perf tests and also confirmed by Cloudflare support. The purpose of this ticket is raise this issue in the open and allow a way for the community to keep track of this. It cost us a week of communications with support before it was found that it was a quic vs http2 issue. We had to switch to using http2.

For anyone internal triaging this, the ticket is at https://support.cloudflare.com/hc/en-us/requests/2690173

To Reproduce Steps to reproduce the behavior:

  1. We are using cloudflare/cloudflared:2023.2.1-arm64
  2. Create a k8s cluster -- we are testing on GKE.
  3. Follow the steps at https://developers.cloudflare.com/cloudflare-one/tutorials/many-cfd-one-tunnel/ except that we are using the dashboard to create the mapping. Therefore, we use a TUNNEL_TOKEN.
  4. For our perf, we set up a deployment using the httpbin docker image.
  5. We ran a bunch of tests against the httpbin deployment against the stream/100 endpoint using k6.io using both http2 and quic.
  6. We noticed that overall http2 was much faster in terms of p95 response times. I've attached the graphs
Screenshot 2023-01-30 at 9 44 46 AM Screenshot 2023-01-30 at 9 44 50 AM

Additional context Add any other context about the problem here.

Our Tunnels engineering manager looked at the escalation ticket and he confirmed that using QUIC will be lower than using http/2 as you also noted.

In the case of QUIC vs HTTP2, it is important to note that QUIC transport operates at user space, so it is normal to be more demanding in terms of CPU.

So, if you compare both with the same restrictions, then it is expected for performance to be slower.

Our EM is going to do some benchmark testing next quarter with an eye towards seeing if there are QUIC improvements that can be made.

For now, we are going to set TUNNEL_TRANSPORT_PROTOCOL to http2.

bompus commented 1 year ago

I just ran into this as well.

However, in my case, switching to http2 is not ideal, because I'm using unix domain sockets ( service: unix:/run/foo.sock ) for some upstream origins, and websockets apparently don't work with the combination of http2 + unix domain sockets ( https://github.com/cloudflare/cloudflared/issues/560#issuecomment-1030079345 ) - they only work over cloudflared via the quic protocol based on the linked issue.

realies commented 1 year ago

I can confirm that switching the protocol from quic to http2 makes the connection a lot faster, however, after running for a few hours, requests stop being served:

2023-03-10T16:43:40Z ERR  error="stream 309 canceled with error code 0" cfRay=7a5db807bfa7250e-LHR originService=https://nginx-proxy-manager
2023-03-10T16:43:40Z ERR Request failed error="stream 309 canceled with error code 0" connIndex=0 dest=https://example.com ip=123.123.123.123 type=http
2023-03-10T16:43:40Z ERR  error="stream 293 canceled with error code 0" cfRay=7a5db7fe28ea250e-LHR originService=https://nginx-proxy-manager
2023-03-10T16:43:40Z ERR Request failed error="stream 293 canceled with error code 0" connIndex=0 dest=https://example.com ip=123.123.123.123 type=http
bompus commented 1 year ago

I ran some throughput tests of my own, since there hasn't been any activity on this issue.

Download of a 1GB test file through cloudflared. I tested from two separate hosts.

Host 1: no tunnel - 149 MB/s protocol: http2 - 152 MB/s protocol: quic - 17 MB/s

Host 2: no tunnel - 89 MB/s protocol: http2 - 94 MB/s protocol: quic - 17 MB/s

CPU usage didn't seem to be an issue, so I'm hoping someone can look into this further.

christidis commented 10 months ago

For anyone internal triaging this, the ticket is at https://support.cloudflare.com/hc/en-us/requests/2690173

This is a 404 for me. Is this a private support ticket? Any updates on that? I have also experienced worse stess test results with QUIC compared to http2.

baflo commented 8 months ago

I can confirm this. Had this issues some years ago (and since used http2). On a new system I forgot about it and had slowness issues for months. After I changed to http2 yesterday, all is fine.

hieucd04 commented 3 weeks ago

Ran into this issue today. Disable QUIC fixed it for me!