http3 could be a huge, quick bang for our buck to significantly improve ttfb without adding more nodes/PoPs

adding http3 support should be broken into three pieces:

[x] do a quick benchmark of nginx's tcp+tls vs http3. for example, fire up an aws ec2 instance far away and run two web servers there, one standard tcp+tls with nginx and the other with quic/http3. then benchmark how long it takes to establish connections to each of those web servers in a browser that supports http3, like chrome (https://caniuse.com/http3)

http3 should be much faster. but is it? this can also serve as a quick test-bed of the variegated http3 tools, libraries, and software that exist right now. see below

ping @DiegoRBaquero to get a production ssl cert to use
[x] understand the architecture of the l1 and investigate the http3 landscape to determine the best tool, library, or software to add http3 support to the l1. various tools:
- nginx doesn't support http3 yet (https://www.nginx.com/blog/our-roadmap-quic-http-3-support-nginx/, https://quic.nginx.org/), but does a custom fork or module?
- caddy (https://caddyserver.com/). if it works well, could it be placed in front of nginx and reverse proxy connections to nginx? caddy -> nginx -> l1 shim
- quiche https://github.com/cloudflare/quiche. this is a lower level library that would have to be integrated into something else. but perhaps someone else has already done this for us. cloudflare did release a patch to integrate quiche into nginx, but the patch is against a 3 year old nginx v1.16
  - https://blog.cloudflare.com/experiment-with-http-3-using-nginx-and-quiche/
  - https://github.com/cloudflare/quiche/tree/master/nginx fwiw nginx v1.16 is 3 years old and EOL'd two years ago https://endoflife.date/nginx
- pay for litespeed, a proprietary http server that supports http3 https://docs.litespeedtech.com/lsws/cp/cpanel/quic-http3/. same setup as w/ caddy, slap it in front of nginx and reverse proxy reqs from litespeed to our existing nginx? litespeed -> nginx -> l1 shim. fwiw according to https://w3techs.com/technologies/segmentation/ce-quic/web_server the majority of quic traffic online is currently served from litespeed. though if cloudflare supports http3, which it supposedly does, these stats dont make sense unless cloudflare uses litespeed. which they dont
- other tools, libraries, and software discovered from googling and research
discuss viable avenues for http3 implementation with @gruns and @DiegoRBaquero

note here that, for simplicity, any non-nginx http3 implementation will likely be replaced with nginx's native http3 support once ready, so long as nginx's implementation suffices. so we shouldn't get too crazy adding http3 support wrt the amount of time, effort, or complexity undertaken here
[x] once http3 has shown itself superior and we've determined the implementation battleplan, add http3 support to the l1 node and ship it, baby 🚀

Update:

Spoke with @DiegoRBaquero to tackle this issue. We concluded that:

We should test locally first, if tests locally show promising results, then we setup EC2 instances to test a more realistic situation.
Testing should be done against our optimized Nginx setup. HTTP/3 is likely to be faster than a raw HTTP/2 setup, but since @DiegoRBaquero has optimized our Nginx setup quite heavily, we should be testing against that as it is only worth it to switch over to HTTP/3 if it beats the optimized performance.

Test Setup:

An Nginx server with our current setup running HTTP/2.
A Caddy server with HTTP/3 enabled. Caddy's server has no optimizations.

Both servers use the same SSL certs and both serve the same generic HTML file. The test setup is documented and made to be easily reproducible in this repo: https://github.com/filecoin-saturn/http-testing.

Tests were run for a fixed duration of 10mins. The number of concurrent users was changed for each test.

Results :

NOTE: These results are now stale, please check comments below for updated results

Service	Protocol	TTFB Mean	Failure Rate	Reqs/s	Concurrent Clients
Nginx	HTTP/2	4.77ms	4%	16215.11	5
Caddy	HTTP/3	9.38ms	0%	10942.86	5
Nginx	HTTP/2	55.7ms	6.7%	2143.27	30
Caddy	HTTP/3	53.22ms	0%	1866.49	30
Nginx	HTTP/2	123.8ms	6.5%	637.36	100
Caddy	HTTP/3	116.78ms	0%	573.19	100
Nginx	HTTP/2	289.95ms	7.4%	212.81	300
Caddy	HTTP/3	300.40ms	0%	223.59	300

Interesting that the failure rate was 0 with Caddy but significant with Nginx 🤔

@gruns @DiegoRBaquero Update on this issue. I noticed that I was pinging each service using a different host on my local machine. I changed it to be the same for both of them, which is the docker host for the benchmarking tool I was running. I re-ran all the tests with the same setup and the results now tell a different story on Caddy's HTTP/3 capabilities.

At 300 concurrent clients, we half the TTFB compared to Nginx. One odd thing is that Nginx's failure rate decreases as the load increases on the webserver.

Service	Protocol	TTFB Mean	Failure Rate	Avg Reqs/s	Concurrent Clients
Nginx	HTTP/2	18.08	8%	7874.58	5
Caddy	HTTP/3	10.71	0%	12600.37	5
Nginx	HTTP/2	53.44ms	5.3%	1429.01	30
Caddy	HTTP/3	44.31ms	0%	2161.78	30
Nginx	HTTP/2	204.73ms	3%	416.33	100
Caddy	HTTP/3	129.87ms	0%	633.46	100
Nginx	HTTP/2	730.78ms	1.3%	123.98	300
Caddy	HTTP/3	363.12ms	0%	156.38	300

@gruns @DiegoRBaquero I also setup a docker image that runs our nginx setup with http3 setup. The image is in the http-testing repo here. For some reason, nginx's HTTP3 implementation is much slower than what we have right now. I contacted on of the developers and they mentioned that QUIC on nginx will probably not be ready until end of 2023.

Here are some of the testing results:

Service	Protocol	TTFB Mean	Failure Rate	Avg Reqs/s	Concurrent Clients
Nginx	HTTP/2	53.44ms	5.3%	1429.01	30
Nginx	HTTP/3	271.17ms	0.2%	519.04	30
Nginx	HTTP/2	204.73ms	3%	416.33	100
Nginx	HTTP/3	1130ms	0.4%	141.82	100
Nginx	HTTP/2	730.78ms	1.3%	123.98	300
Nginx	HTTP/3	4460ms	0%	100.7	300

Update 2

I ran more tests by deploying Nginx and Caddy on AWS lightsail instances in Frankfurt (eu-central-1).

One instance is running Caddy with http3 enabled.
One instance is running Nginx with http3 enabled.

With 5 Concurrent Clients:

Service	Protocol	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Nginx	HTTP/2	348.20ms	0.7%	80.4	5
Caddy	HTTP/2	432.36ms	0%	71	5
Nginx	HTTP/3	222.2ms	0%	66.35	5
Caddy	HTTP/3	330.82ms	0%	88.6	5

With 30 Concurrent Clients:

Service	Protocol	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Nginx	HTTP/2	389.85ms	0.6%	76.29	30
Caddy	HTTP/2	456.13ms	0%	62.32	30
Nginx	HTTP/3	250.52ms	0%	70.59	30
Caddy	HTTP/3	379.93ms	0%	82.13	30

With 100 Concurrent Clients:

Service	Protocol	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Nginx	HTTP/2	454.82ms	0.6%	65.27	100
Caddy	HTTP/2	563.34ms	0%	36.8	100
Nginx	HTTP/3	751.78ms	0.02%	67	100
Caddy	HTTP/3	448.59ms	0%	32.59	100

promising!! 🎉

what's the ping/RTT between you (the client) and the frankfurt lightsail instance (the server)?

also interesting that its inconsistent whether caddy or nginx is faster, depending on the # of concurrent clients. at 5 and 30 concurrent clients, nginx's http3 ttfb (220ms, 250ms) is faster than caddy's http3 ttfb (330ms, 380ms), but at 100 concurrent clients caddy's http3 ttfb (450ms) is lower than nginx's http3 ttfb (750ms). interesting. and weird. 🤔

also nginx's http3 ttfb (750ms) was way worse than nginx's http2 ttfb (450ms) at 100 concurrent clients. besides that one measurement (nginx's http3 ttfb at 100 concurrents) the data makes sense and http3's ttfb is lower across the board for all # of concurrents for both nginx and caddy

@AmeanAsad worth re-running the experiment with 100 concurrent clients and, perhaps, 200, too. if your computer was the client, maybe something else was using bw in the background while the nginx http3 ttfb test ran?

jk. likely not an abberation. from looking at the local benchmarks posted above, nginx's http3 implementation seems to choke on high numbers of concurrent clients. as shown in these benchmarks:

Service	Protocol	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Nginx	HTTP/2	204.73ms	3%	416.33	100
Nginx	HTTP/3	1130ms	0.4%	141.82	100
Nginx	HTTP/2	730.78ms	1.3%	123.98	300
Nginx	HTTP/3	4460ms	0%	100.7	300

@gruns Pinged poth servers using ping <server-ip> commmand. The time hovers at 100ms consistently.

Result from terminal:

round-trip min/avg/max/stddev = 99.916/101.709/103.361/1.251 ms

Update 3:

@gruns @DiegoRBaquero

TLDR:

Http3 is better almost everywhere than http2.
Nginx is better than Caddy (incase we decide to proceed with http3).
A potential next step is to deploy Nginx's http3 image on the saturn test net and see how it performs.

Http3 vs Http2:

In the previous update, http3 performs worse than http2 at 100 concurrent clients. The intuitive assumption is that the trend continues as concurrency rises, but further testing shows otherwise. I tried testing this multiple times and even setup another VPS in Singapore to test it, but it seems like there is a concurrency range where http2 is just better. This is puzzling and still does not make sense. Otherwise, across the board, http3 is a clear winner.

Service	Protocol	TTFB Mean	Reqs/s	Concurrent Clients
Nginx	HTTP/2	2.88s	31	200
Nginx	HTTP/3	1.05s	46.27	200
Nginx	HTTP/2	10.81s	24.4	300
Nginx	HTTP/3	2.46s	35.1	300

Caddy vs Nginx:

Nginx much better at handling concurrency vs Caddy. Caddy's webserver failed at 600 concurrent clients, while Nginx handled that no problem.
In the previous graphs, Caddy seemed to match Nginx's performance or even surpass it at the 100 concurrent client mark. The reason is that the reqs/s for Caddy was lower. When equating the reqs/s for both tools, Nginx is better. This is demonstraed in the following table (note with the benchmarking tool, it is not possible to set the reqs/s to an exact number, but you can only bound it):

Service	Protocol	TTFB Mean	Avg Reqs/s	Concurrent Clients
Nginx	HTTP/2	2.88s	31	200
Caddy	HTTP/2	1.05s	12.59	200
Nginx	HTTP/2	729.21ms	13.87	200

Caddy consumes way more CPU than Nginx. For 200 concurrent clients, Caddy's CPU usage rose to ~66% while Nginx remaining at ~20%

Nginx'a CPU consumption:

Caddy's CPU consumptions under the same load:

Next Steps:

I think the tests have shown enough promise from Nginx's http3 that it warrants we give the setup a shot on the saturn test network. The image is ready and the setup should be minimal.
One consideration is that Nginx's QUIC will probably not be fully released until end of 2023 per one of their developers. The Nginx team does frequently merge their main branch to their Quic branch and the latest Nginx version on the QUIC branch is 1.23.1. I think the big question is whether we want to risk implementing Nginx's http3 while its still in beta.

next questions:

how do browsers, eg chrome and firefox, know when to connect to a site with http3 over http2?

if only after the Alt-Svc header (https://http3-explained.haxx.se/en/h3/h3-altsvc) has been sent over http2, do browsers remember that header so subsequent connections are made with http3 instead of http2?
what are the benchmarks for one client, one request at a time? like with curl (https://curl.se/docs/http3.html)

do we see see a clear 1RTT for http3 and 3RTT for http2?
nginx's http3 implementation outperforms caddy's http3 implementation, while both http3 implementations are in beta. can we use nginx's http3 support in production while in beta? what work remains outstanding in nginx's http3 support? what is missing? incomplete? broken? what risks do we adopt by adopting nginx's current http3 support in production?
what work needs to be done to benchmark http3 vs http2 in our test network?
- what changes need to be made to arc clients?
- what new data needs to get logged?
- what new dashboards need to be created to display the new data?
- question: can a page, which loaded over http2 (ie a page with arc), request an asset from another domain, eg strn.pl, over http3? i think so, but i dont know

Update 4:

@gruns Addressing some of the questions above:

Once an HTTP3 connection is established, does the browser remember that for subsequent connections?:

Yes, browsers do cache the Alt-Svc header. The caching is identified by a param called ma -> max age. This param is configurable in our server setup. Here is an example of how I used it in the Nginx http3 setup. Source

Benchmarks for 1 Client:

Protocol	TTFB	Time to Connect
HTTP/2	331.97ms	221.38ms
HTTP/3	217.76ms	111.79ms

Does the benchmarking tool open new connections for each request?:

I ran the client with 5 requests and it almost the same tffb and time to connect values for 1 request. This points me to believe that the connections are re-established per request.
There are no confirming answers to this in the docs. I opened this issue to get this answered by the dev. The dev doesn't have any contact info beyond github.

How ready is nginx's http3?:

From their docs: "Implementation of the QUIC and HTTP/3 protocols is largely complete and we are now undertaking the careful process of merging the new code in the nginx-quic branch to the NGINX mainline branch (where we release new features)."
However, they also say: "The code is currently at a beta level of quality and should not be used in production."
Current Problems: They do not currently list all the problems still being solved. They say they are working on optimizations to performance.
I have contacted one of their developers to get more info on this.

What work needs to be done to benchmark http3 vs http2 in our test network?: Gonna also @DiegoRBaquero, since he probably has a better idea of this than me. Here is what I imagine the work will be like:

Deploy the nginx image on the test net.
Probably need to flag http3 requests and store the logs into our AWS tables. (either create a new one or use an existing table and add an extra column to indicate the protocol).
For visualizing, I'd imagine we'd re-use much of what we already have setup on Grafana. Mainly, im thinking we'd be tracking the TFFB values of HTTP/2 requests vs HTTP/3 requests. We'd also look in general how the whole test net TTFB trends after we launch HTTP/3.

Can a page, which loaded over http2 (ie a page with arc), request an asset from another domain, eg strn.pl, over http3?

Based on the description of the protocol, I would say yes. The protocol says that once the client receives an Alt-Svc header that indicates HTTP/3, it has the option to attempt to setup a QUIC connection to that destination and if successful, continue communicating with the origin like that instead of the initial HTTP version. This is of course conditional on the fact that the client and server must both support QUIC and HTTP/3.

sickkkk. remaining todos now for @joaosa with the baton 💪:

[x] test how chrome and firefox (https://caniuse.com/http3) actually implement http3 connections
- how do these browser initiate subsequent connections with http3 after receiving an Alt-Svc upgrade response header?
- why is the Alt-Svc's caniuse score 50% (https://caniuse.com/mdn-http_headers_alt-svc) while http3's caniuse score is 75% (https://caniuse.com/http3)
- can an http3 connection be made from a page loaded via http2, as will be the case with arc widgets?
[x] determine the state of nginx's http3 support. what is missing? what is broken? can we put nginx's http3 implementation into production in saturn? hopefully 🤞
[x] if everything above is good, add http3 support to l1s
[x] deploy the http3 l1s to the test network
[x] begin logging the ttfb performance difference between http2 and http3 in the test network (for @guanzo)
[ ] graph the http2 vs http3 log data to the grafana dashboads (for @joaosa or @AmeanAsad)
[ ] when all looks good and things have proven stable in the test network, push to production and pop that champagne 🎉

@joaosa once you get a sense of the above items, sync with me (@gruns) on an estimated timeline for the above 👍

Alright, I ran some benchmarks with Envoy and HAProxy acting as reverse-proxies and with simplehttp2server as their backend. These proxies use http3/http2 downstream and http2 upstream.

I tried to replicate Amean's setup in order to have "comparable" results (refer to here). I ran the benchmarking tool (h2load) from my machine towards a couple of lightsail Ubuntu 20.04 machines I deployed. I generated certs for them with mkcert and modified my /etc/hosts file to be able to reach them.

I'm going to comment on the results scenario by scenario (http2 vs http3), explain a couple of things I tried and then go for the final benchmarks. @DiegoRBaquero @AmeanAsad @gruns It would be great to have your input on this. Hopefully, together we'll uncover more stuff from these results!

HAProxy h3 vs h2

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h3 https://haproxy.io`

finished in 6.37s, 1570.85 req/s, 1.12MB/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 7.13MB (7480000) total, 2.91MB (3050000) headers (space savings -5.90%), 4.17MB (4370000) data
UDP datagram: 6351 sent, 21426 received
                     min         max         mean         sd        +/- sd
time for request:    43.68ms       2.06s    526.60ms    323.54ms    80.59%
time for connect:    56.76ms       1.12s    311.52ms    380.30ms    81.00%
time to 1st byte:   314.43ms       2.63s       1.02s    556.59ms    76.00%
req/s           :      15.72       26.87       17.86        2.10    85.00%

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h2 https://haproxy.io`

finished in 6.38s, 1566.83 req/s, 1.01MB/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 6.45MB (6763600) total, 2.11MB (2210000) headers (space savings 23.26%), 4.17MB (4370000) data
                     min         max         mean         sd        +/- sd
time for request:    46.24ms       2.96s    543.88ms    377.94ms    80.79%
time for connect:    92.19ms    209.09ms    153.40ms     31.77ms    60.00%
time to 1st byte:   241.91ms       1.93s    721.64ms    343.79ms    77.00%
req/s           :      15.67       27.09       17.74        2.15    89.00%

Clearly, h2's TTFB is way better than h3's. Also those max values on h3 isn't looking good. Support for h3 in HAProxy is experimental, so this probably needs to be revisited as it evolves.

Envoy h3 vs h2

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h3 https://envoy.io`

finished in 9.12s, 1096.93 req/s, 499.14KB/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.44MB (4659527) total, 127.00KB (130050) headers (space savings 95.70%), 4.21MB (4410000) data
UDP datagram: 10436 sent, 16728 received
                     min         max         mean         sd        +/- sd
time for request:    46.87ms       1.18s    283.23ms    171.84ms    72.76%
time for connect:    78.60ms       7.28s       2.64s       2.89s    74.00%
time to 1st byte:   197.01ms       7.46s       2.98s       2.80s    74.00%
req/s           :      10.99       34.94       20.29        6.55    57.00%

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h2 https://envoy.io`

finished in 6.18s, 1618.58 req/s, 764.33KB/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.61MB (4835551) total, 234.42KB (240051) headers (space savings 93.58%), 4.21MB (4410000) data
                     min         max         mean         sd        +/- sd
time for request:    50.65ms       2.50s    550.51ms    381.71ms    84.86%
time for connect:    94.38ms    258.03ms    159.05ms     43.32ms    65.00%
time to 1st byte:   447.23ms       1.10s    675.79ms    251.45ms    75.00%
req/s           :      16.20       23.58       17.26        1.27    88.00%

Envoy seems to have a slightly higher h3 throughput when compared to h2 and both approaches for HAProxy. The TTFB/connect values in this attempt are terrible though. Thankfully, we later managed to solve this.

By now, we have established h2 is working fine, so let's try to improve h3. One thing we can improve is the the size of the UDP buffers (which Envoy thankfully complained about). I took the values from here.

h3 `net.core.rmem_max` and `net.core.rmem_max` tweaks

Ran sudo sysctl -w net.core.rmem_max=26214400 and sudo sysctl -w net.core.rmem_max=26214400 beforehand. I tried larger values, but that didn't seem to make a relevant difference.

Envoy h3

finished in 7.23s, 1383.58 req/s, 631.24KB/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.46MB (4671842) total, 126.95KB (130000) headers (space savings 95.71%), 4.21MB (4410000) data
UDP datagram: 10907 sent, 16830 received
                     min         max         mean         sd        +/- sd
time for request:    47.21ms       2.31s    633.94ms    297.75ms    71.00%
time for connect:    78.47ms    565.65ms    397.77ms    112.46ms    55.00%
time to 1st byte:   384.95ms       1.47s    898.50ms    255.63ms    53.00%
req/s           :      13.87       17.09       14.41        0.57    88.00%

This is great! Envoy's TTFB performance looks like what we would expect, even if its throughput values decreased.

HAProxy h3

finished in 83.14s, 14.43 req/s, 10.54KB/s
requests: 10000 total, 2080 started, 1200 done, 1200 succeeded, 8800 failed, 8800 errored, 0 timeout
status codes: 1200 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 876.56KB (897600) total, 357.42KB (366000) headers (space savings -5.90%), 512.11KB (524400) data
UDP datagram: 3474 sent, 3772 received
                     min         max         mean         sd        +/- sd
time for request:    47.75ms    386.50ms     98.43ms     75.74ms    90.00%
time for connect:    50.83ms    185.47ms    114.69ms     41.94ms    55.00%
time to 1st byte:   298.00ms    437.64ms    374.51ms     46.10ms    58.33%
req/s           :       0.00       98.87       11.16       30.39    88.00%

RIP HAProxy. This approach clearly didn't help and it even introduced consistent failed requests to the mix. Definitely a no go.

With this in mind let's focus a bit more on the tweaked Envoy and regular HAProxy and see where this goes.

Benchmarks

From here on, I tried reproducing Amean's test setup and 10 minute benchmarks for Envoy with 100, 200, and 300 clients.

Envoy h3

One thing I immediately noticed with docker run --rm -it --network=host h2load-http3 -c100 -m200 --duration=600 --warm-up-time=5 --npn-list h3 https://envoy.io, is that Envoy's started spiking gloriously to 99% CPU usage. When the test finished we had about 93% failed reqs (bad Envoy).

I initially assumed this meant Envoy was a lot less resource efficient than Nginx. I spawned a more powerful machine to try to make this work (4 vs 1 vCPUs and 16x more memory as a side-effect). Still, I got failed requests. In the end, it turned it was the max concurrent streams to issue per client (aka -m) causing all of kinds of mischief (we should probably revisit stream concurrency later). I went back to the smaller machine, so as not to give Envoy an unfair advantage.

For clarity's sake here are the runs and their results:

docker run --rm -it --network=host h2load-http3 -c100 --duration=600 --npn-list h3 https://envoy.io

finished in 600.02s, 1116.12 req/s, 507.63KB/s
requests: 669675 total, 669775 started, 669675 done, 669675 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 669675 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 297.44MB (311889144) total, 8.41MB (8820584) headers (space savings 95.64%), 281.65MB (295326675) data
UDP datagram: 2353223 sent, 2091152 received
                     min         max         mean         sd        +/- sd
time for request:    42.22ms    883.66ms     89.54ms     21.89ms    81.58%
time for connect:    74.85ms    426.47ms    356.05ms     65.12ms    71.00%
time to 1st byte:   353.05ms    548.92ms    448.12ms     60.78ms    71.00%
req/s           :      10.62       11.82       11.16        0.35    62.00%

docker run --rm -it --network=host h2load-http3 -c200 --duration=600 --npn-list h3 https://envoy.io

finished in 600.03s, 1177.70 req/s, 539.16KB/s
requests: 706622 total, 706822 started, 706622 done, 706622 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 706691 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 315.91MB (331259764) total, 8.86MB (9292259) headers (space savings 95.65%), 297.18MB (311620302) data
UDP datagram: 2767907 sent, 2295509 received
                     min         max         mean         sd        +/- sd
time for request:    42.84ms       1.27s    169.59ms     47.65ms    72.94%
time for connect:    73.29ms       1.36s    667.19ms    348.45ms    71.00%
time to 1st byte:   463.84ms       1.52s    845.13ms    338.81ms    79.50%
req/s           :       5.63        6.21        5.89        0.16    60.00%

docker run --rm -it --network=host h2load-http3 -c300 --duration=600 --npn-list h3 https://envoy.io

finished in 600.05s, 1225.51 req/s, 564.06KB/s
requests: 735307 total, 735607 started, 735307 done, 735307 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 735318 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 330.50MB (346559021) total, 9.20MB (9643006) headers (space savings 95.67%), 309.25MB (324270387) data
UDP datagram: 2672506 sent, 2158533 received
                     min         max         mean         sd        +/- sd
time for request:    43.50ms    811.13ms    244.33ms     62.76ms    73.98%
time for connect:    74.73ms       3.35s       1.04s    690.06ms    76.00%
time to 1st byte:   366.25ms       3.62s       1.24s    702.55ms    87.33%
req/s           :       3.96        4.21        4.08        0.07    59.67%

HAProxy h3

Follow the same approach as with Envoy:

docker run --rm -it --network=host h2load-http3 -c100 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.04s, 1424.00 req/s, 1.02MB/s
requests: 854398 total, 854498 started, 854398 done, 854398 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 854421 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 609.49MB (639096788) total, 248.53MB (260598405) headers (space savings -5.90%), 356.08MB (373371926) data
UDP datagram: 2524842 sent, 2647903 received
                     min         max         mean         sd        +/- sd
time for request:    42.13ms    374.40ms     70.15ms      7.74ms    74.16%
time for connect:    53.00ms       3.07s    569.44ms    733.84ms    95.00%
time to 1st byte:   171.35ms       3.14s    689.34ms    708.79ms    95.00%
req/s           :      13.93       14.59       14.24        0.21    55.00%

docker run --rm -it --network=host h2load-http3 -c200 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.06s, 1545.79 req/s, 1.10MB/s
requests: 927476 total, 927676 started, 927476 done, 927476 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 927529 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 661.63MB (693768372) total, 269.79MB (282897565) headers (space savings -5.90%), 386.53MB (405307012) data
UDP datagram: 2755539 sent, 2891841 received
                     min         max         mean         sd        +/- sd
time for request:    42.96ms    613.39ms    129.14ms     18.31ms    85.19%
time for connect:    60.12ms       3.12s       1.04s       1.09s    81.00%
time to 1st byte:   197.06ms       3.51s       1.26s       1.10s    81.00%
req/s           :       7.65        7.80        7.73        0.03    65.50%

docker run --rm -it --network=host h2load-http3 -c300 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.07s, 1481.49 req/s, 1.06MB/s
requests: 888892 total, 889151 started, 888892 done, 888892 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 888906 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 634.09MB (664895528) total, 258.56MB (271116635) headers (space savings -5.90%), 370.46MB (388451922) data
UDP datagram: 2667025 sent, 2788067 received
                     min         max         mean         sd        +/- sd
time for request:    44.44ms    616.11ms    174.32ms     22.75ms    89.36%
time for connect:    70.68ms       7.14s       1.65s       2.02s    91.51%
time to 1st byte:   226.89ms       7.23s       1.88s       1.98s    91.51%
req/s           :       0.00        5.78        4.94        1.97    86.33%

Conclusions

Taking Amean's results and these, here we go:

Service	Protocol	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Caddy	HTTP/3	448.59ms	0.00%	32.59	100
Nginx	HTTP/3	751.78ms	0.02%	67	100
Envoy	HTTP/3	448.12ms	0.00%	11.16	100
HAProxy	HTTP/3	689.34ms	0.00%	14.24	100
Envoy	HTTP/3	845.13ms	0.00%	5.89	200
HAProxy	HTTP/3	1.26s	0.00%	7.73	200
Envoy	HTTP/3	1.24s	0.00%	4.08	300
HAProxy	HTTP/3	1.88s	0.00%	4.94	300

Envoy's TTFB values seem pretty decent given what we've seen in terms of h3 overall (it uses Cloudflare's quiche). Also, the devs say "HTTP/3 downstream support is ready for production use, but continued improvements are coming (...)" over here. HAProxy appears to be doing less well, but with a slightly higher throughput.

One aspect I'm concerned about in both scenarios is perceived throughput, which seems to be lower than with NGINX/Caddy. Given I'm using an reverse-proxy backend (with h2) for both HAProxy and Envoy, results may not be exactly comparable as there is another moving piece whose performance could be impacting the results. What are your thoughts on this?

Either way, I warrant these are good enough reasons to try Envoy (and possibly HAProxy) out in more realistic scenarios. What do you all think?

I decided to benchmark nginx as well to allow for comparisons following the same methodology as described above. I added a proxy backend for nginx. We're not caching responses, so as to get fairer values.

I should note I'm using httpd as a http1.1 backend here.

Nginx h3 (without `rmem` changes)

docker run --rm -it --network=host h2load-http3 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.04s, 166.53 req/s, 122.54KB/s
requests: 99917 total, 100000 started, 99917 done, 99917 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 71.80MB (75291700) total, 24.13MB (25300000) headers (space savings 27.92%), 46.83MB (49100000) data
UDP datagram: 236898 sent, 300403 received
                     min         max         mean         sd        +/- sd
time for request:    44.84ms       1.27s    116.58ms     53.29ms    81.02%
time for connect:    53.03ms       3.12s    749.35ms    841.66ms    92.00%
time to 1st byte:   161.89ms       3.23s    905.40ms    849.89ms    92.00%
req/s           :       8.26        8.97        8.52        0.11    75.00%

docker run --rm -it --network=host h2load-http3 -c200 --duration=600 --npn-list h3 https://nginx.io

finished in 600.11s, 47.81 req/s, 35.13KB/s
requests: 28685 total, 28872 started, 28685 done, 28685 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 28685 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 20.58MB (21583959) total, 6.92MB (7257305) headers (space savings 27.92%), 13.43MB (14084335) data
UDP datagram: 61918 sent, 87504 received
                     min         max         mean         sd        +/- sd
time for request:    44.94ms       1.10s    214.38ms     66.03ms    85.44%
time for connect:    52.72ms       7.26s       2.67s       2.75s    76.47%
time to 1st byte:   178.57ms       8.35s       3.02s       2.90s    76.47%
req/s           :       0.00        2.79        2.19        0.65    93.50%

docker run --rm -it --network=host h2load-http3 -c300 --duration=600 --npn-list h3 https://nginx.io

finished in 600.09s, 326.53 req/s, 256.78KB/s
requests: 195919 total, 196000 started, 195919 done, 195919 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 196000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 150.46MB (157763732) total, 47.29MB (49588000) headers (space savings 27.92%), 101.50MB (106428000) data
UDP datagram: 398033 sent, 589867 received
                     min         max         mean         sd        +/- sd
time for request:    44.05ms       1.05s    202.65ms     56.23ms    82.53%
time for connect:    67.43ms       7.24s       2.54s       2.71s    78.06%
time to 1st byte:   193.00ms       8.27s       2.87s       2.86s    78.06%
req/s           :       0.00        5.04        3.18        2.32    65.33%

These last two results aren't good. Thankfully, h2load decided to present me with ERR_DRAINING errors. After some googling, I decided to try out tweaking rmem, as we had previously done with Envoy and HAProxy.

Nginx h3 (with `rmem` changes)

sudo sysctl -w net.core.rmem_max=26214400 && sudo sysctl -w net.core.rmem_default=26214400

docker run --rm -it --network=host h2load-http3 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.05s, 166.54 req/s, 131.01KB/s
requests: 99923 total, 100000 started, 99923 done, 99923 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 76.76MB (80491700) total, 24.13MB (25300000) headers (space savings 27.92%), 51.78MB (54300000) data
UDP datagram: 235805 sent, 300736 received
                     min         max         mean         sd        +/- sd
time for request:    65.04ms    884.09ms    117.02ms     52.10ms    80.88%
time for connect:    53.70ms    256.58ms    153.87ms     59.01ms    56.00%
time to 1st byte:   258.63ms    395.89ms    326.88ms     41.13ms    56.00%
req/s           :       8.36        8.75        8.53        0.08    71.00%

docker run --rm -it --network=host h2load-http3 -c200 --duration=600 --npn-list h3 https://nginx.io

finished in 600.07s, 48.59 req/s, 38.17KB/s
requests: 29156 total, 29356 started, 29156 done, 29156 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 29156 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 22.37MB (23453633) total, 7.03MB (7376468) headers (space savings 27.92%), 15.10MB (15831708) data
UDP datagram: 60628 sent, 88193 received
                     min         max         mean         sd        +/- sd
time for request:   171.05ms    930.30ms    245.49ms     46.77ms    82.83%
time for connect:    49.91ms    452.35ms    250.84ms    117.38ms    58.50%
time to 1st byte:   452.46ms    725.45ms    589.88ms     79.27ms    57.50%
req/s           :       2.13        2.25        2.21        0.02    69.50%

docker run --rm -it --network=host h2load-http3 -c300 --duration=600 --npn-list h3 https://nginx.io

finished in 600.11s, 499.93 req/s, 393.03KB/s
requests: 299957 total, 300000 started, 299957 done, 299957 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 300000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 230.29MB (241475100) total, 72.38MB (75900000) headers (space savings 27.92%), 155.35MB (162900000) data
UDP datagram: 603054 sent, 902439 received
                     min         max         mean         sd        +/- sd
time for request:    66.37ms       1.24s    374.83ms     31.22ms    97.20%
time for connect:    68.27ms    670.26ms    355.15ms    175.99ms    57.67%
time to 1st byte:   651.95ms       1.12s    884.57ms    131.39ms    57.33%
req/s           :       2.65        2.68        2.67        0.01    69.33%

Conclusions & results revisited

Including the nginx results with the rmem tweaks. It seems like nginx is after all the best performer in terms of TTFB (even if with a smaller throughput). I suppose that this means there's another thing we can try nginx-wise. We can tweak the net.core.rmem_max and net.core.rmem_default server-side and see where that goes. Any thoughts?

Service	Protocol	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Nginx	HTTP/3	326.88ms	0.00%	8.53	100
Envoy	HTTP/3	448.12ms	0.00%	11.16	100
HAProxy	HTTP/3	689.34ms	0.00%	14.24	100
Nginx	HTTP/3	589.88ms	0.00%	2.21	200
Envoy	HTTP/3	845.13ms	0.00%	5.89	200
HAProxy	HTTP/3	1.26s	0.00%	7.73	200
Nginx	HTTP/3	884.57ms	0.00%	2.67	300
Envoy	HTTP/3	1.24s	0.00%	4.08	300
HAProxy	HTTP/3	1.88s	0.00%	4.94	300

@joaosa Really interesting results. The rmem changes for nginx seem to have really good ttfb results compared to everything else. I am curious on how concurrent streams impacts the results and how applicable that is to Saturn since all the recent results are using 1 concurrent stream by default?

Let's look into concurrent streams (where m=1 are aka the results above). From here and the RFC, "It is recommended that this value be no smaller than 100, so as to not unnecessarily limit parallelism.".

m=10

nginx

docker run --rm -it --network=host h2load-http3 -m10 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.07s, 166.66 req/s, 131.00KB/s
requests: 100002 total, 100143 started, 100002 done, 99995 succeeded, 7 failed, 4 errored, 0 timeout
status codes: 99997 2xx, 0 3xx, 0 4xx, 3 5xx
traffic: 76.76MB (80489033) total, 24.13MB (25299403) headers (space savings 27.92%), 51.78MB (54298821) data
UDP datagram: 229913 sent, 266437 received
                     min         max         mean         sd        +/- sd
time for request:    42.17ms      53.25s       1.06s    789.42ms    88.03%
time for connect:    48.69ms    262.68ms    162.26ms     59.54ms    58.00%
time to 1st byte:   271.19ms      15.58s       1.25s       2.16s    97.00%
req/s           :       9.21        9.95        9.39        0.15    79.00%

Interesting to see how performance degraded substantially here.

envoy

docker run --rm -it --network=host h2load-http3 -m10 -c100 --duration=600 --npn-list h3 https://envoy.io

finished in 600.04s, 1491.68 req/s, 680.32KB/s
requests: 895009 total, 896009 started, 895009 done, 894608 succeeded, 401 failed, 0 errored, 0 timeout
status codes: 894750 2xx, 0 3xx, 0 4xx, 401 5xx
traffic: 398.62MB (417987614) total, 11.13MB (11667962) headers (space savings 95.70%), 376.29MB (394563598) data
UDP datagram: 958395 sent, 1286893 received
                     min         max         mean         sd        +/- sd
time for request:    84.11ms       3.86s    669.28ms    297.25ms    73.73%
time for connect:    98.56ms    885.24ms    459.46ms    133.31ms    65.00%
time to 1st byte:   411.02ms       1.40s    899.76ms    252.07ms    59.00%
req/s           :      14.55       15.19       14.92        0.13    68.00%

Performance degraded.

haproxy

docker run --rm -it --network=host h2load-http3 -m10 -c100 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.06s, 1721.59 req/s, 1.23MB/s
requests: 1032956 total, 1033956 started, 1032956 done, 1032956 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1033070 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 736.89MB (772686200) total, 300.49MB (315086350) headers (space savings -5.90%), 430.49MB (451401772) data
UDP datagram: 568336 sent, 2225495 received
                     min         max         mean         sd        +/- sd
time for request:    48.81ms       6.32s    580.12ms    485.98ms    86.04%
time for connect:    50.72ms       1.14s    334.37ms    419.83ms    77.00%
time to 1st byte:   281.95ms       2.70s    948.17ms    500.56ms    63.00%
req/s           :      15.86       18.19       17.22        0.49    70.00%

Performance degraded.

m=50

nginx

docker run --rm -it --network=host h2load-http3 -m50 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.04s, 123.32 req/s, 95.16KB/s
requests: 74163 total, 75683 started, 74163 done, 72126 succeeded, 2037 failed, 171 errored, 0 timeout
status codes: 72126 2xx, 0 3xx, 0 4xx, 1866 5xx
traffic: 55.76MB (58464682) total, 17.50MB (18348642) headers (space savings 28.01%), 37.63MB (39463148) data
UDP datagram: 150490 sent, 171274 received
                     min         max         mean         sd        +/- sd
time for request:    42.60ms      92.65s       3.35s      11.72s    96.20%
time for connect:    50.27ms    249.07ms    149.91ms     59.04ms    57.00%
time to 1st byte:   261.25ms      17.12s       2.22s       2.19s    86.52%
req/s           :       0.00       21.98        9.67        4.79    75.00%

Worse than m=10.

envoy

docker run --rm -it --network=host h2load-http3 -m50 -c100 --duration=600 --npn-list h3 https://envoy.io

finished in 600.07s, 5799.28 req/s, 742.80KB/s
requests: 3479569 total, 3484569 started, 3479569 done, 313151 succeeded, 3166418 failed, 0 errored, 0 timeout
status codes: 313348 2xx, 0 3xx, 0 4xx, 3166439 5xx
traffic: 435.24MB (456377946) total, 28.08MB (29443038) headers (space savings 93.55%), 376.32MB (394598162) data
UDP datagram: 219142 sent, 512249 received
                     min         max         mean         sd        +/- sd
time for request:    47.83ms      10.43s    848.50ms       1.06s    88.02%
time for connect:    80.27ms       1.30s    759.14ms    354.78ms    61.00%
time to 1st byte:   663.80ms       4.04s       1.77s       1.11s    75.00%
req/s           :      52.71       62.81       57.99        2.16    71.00%

Worse than m=10 and with lots of failed requests.

haproxy

docker run --rm -it --network=host h2load-http3 -m50 -c100 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.04s, 2086.55 req/s, 1.40MB/s
requests: 1251928 total, 1256928 started, 1251928 done, 1156956 succeeded, 94972 failed, 0 errored, 0 timeout
status codes: 1157233 2xx, 0 3xx, 0 4xx, 94972 5xx
traffic: 842.06MB (882963252) total, 343.04MB (359702737) headers (space savings -5.84%), 491.86MB (515755272) data
UDP datagram: 354715 sent, 1956774 received
                     min         max         mean         sd        +/- sd
time for request:    43.53ms      33.43s       1.51s       2.38s    89.70%
time for connect:    49.88ms       1.12s    375.18ms    444.18ms    73.00%
time to 1st byte:   225.79ms      10.39s       3.45s       2.34s    73.44%
req/s           :       0.00       39.82       20.87       15.91    56.00%

Worse than m=10 and with quite some failed requests.

m=100

nginx

docker run --rm -it --network=host h2load-http3 -m100 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.09s, 60.68 req/s, 46.81KB/s
requests: 37467 total, 44776 started, 37467 done, 35480 succeeded, 1987 failed, 1058 errored, 0 timeout
status codes: 35480 2xx, 0 3xx, 0 4xx, 929 5xx
traffic: 27.43MB (28760970) total, 8.61MB (9026606) headers (space savings 28.01%), 18.52MB (19418020) data
UDP datagram: 72017 sent, 81116 received
                     min         max         mean         sd        +/- sd
time for request:    40.41ms      73.59s       4.89s      13.20s    93.95%
time for connect:    48.85ms    248.64ms    149.19ms     59.13ms    58.00%
time to 1st byte:   261.89ms       4.79s       1.71s    819.79ms    82.54%
req/s           :       0.00       64.01        9.31       13.15    93.00%

Way worse than m=50. Lots of ERR_DRAINING and ERR_CALLBACK_FAILURE to note here.

envoy

docker run --rm -it --network=host h2load-http3 -m100 -c100 --duration=600 --npn-list h3 https://envoy.io

finished in 600.09s, 6284.10 req/s, 748.49KB/s
requests: 3770462 total, 3780462 started, 3770462 done, 245039 succeeded, 3525423 failed, 0 errored, 0 timeout
status codes: 245111 2xx, 0 3xx, 0 4xx, 3525464 5xx
traffic: 438.57MB (459872382) total, 30.05MB (31511292) headers (space savings 93.39%), 375.44MB (393677033) data
UDP datagram: 171388 sent, 483205 received
                     min         max         mean         sd        +/- sd
time for request:    61.39ms      12.32s       1.31s       1.45s    88.24%
time for connect:    78.20ms       5.22s    957.55ms    674.79ms    72.00%
time to 1st byte:   484.15ms       6.46s       1.76s       1.16s    72.00%
req/s           :      55.18       70.63       62.84        2.91    70.00%

Similar to m=50, but with a really high amount of failed requests.

haproxy

docker run --rm -it --network=host h2load-http3 -m100 -c100 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.06s, 2.63 req/s, 1.92KB/s
requests: 1579 total, 11579 started, 1579 done, 1579 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1579 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 1.13MB (1181092) total, 470.31KB (481595) headers (space savings -5.90%), 673.85KB (690023) data
UDP datagram: 2532 sent, 2589 received
                     min         max         mean         sd        +/- sd
time for request:    72.63ms      10.41s       2.37s       2.63s    85.75%
time for connect:    52.97ms       1.16s    372.71ms    452.19ms    75.00%
time to 1st byte:      1.35s       9.95s       3.62s       2.34s    91.67%
req/s           :       0.00       19.01        0.41        2.05    95.00%

finished in 600.05s, 5.26 req/s, 3.88KB/s
requests: 3154 total, 13154 started, 3154 done, 3154 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 3229 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 2.27MB (2382292) total, 961.76KB (984845) headers (space savings -5.90%), 1.31MB (1378298) data
UDP datagram: 2419 sent, 4472 received
                     min         max         mean         sd        +/- sd
time for request:   137.99ms       8.10s       3.13s       1.79s    56.37%
time for connect:    60.54ms       3.07s    441.80ms    890.00ms    90.00%
time to 1st byte:      2.12s       6.01s       4.53s    981.35ms    60.00%
req/s           :       0.00        6.27        0.88        1.76    82.00%

We get really low throughput or no results at all.

Preliminary max concurrent streams conclusions

Increasing the number of max concurrent streams seems to have a negative impact on both Nginx and HAProxy. That was the case on Envoy (from 1 to 50), but then it seemed to stabilize. Given both HAProxy and Nginx started getting really low throughput, I think something is clearly off. I'm going to try more things (namely enabling GSO for Nginx. See more here).

Here's the summary:

Service	Protocol	M	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Nginx	HTTP/3	1	326.88ms	0.00%	8.53	100
Nginx	HTTP/3	10	1.25s	1.00%	9.39	100
Nginx	HTTP/3	50	2.22s	3.00%	9.67	100
Nginx	HTTP/3	100	1.71s	6.00%	9.31	100
Envoy	HTTP/3	1	448.12ms	0.00%	11.16	100
Envoy	HTTP/3	10	899.76ms	1.00%	14.92	100
Envoy	HTTP/3	50	1.77s	92.00%	57.99	100
Envoy	HTTP/3	100	1.76s	94.00%	62.84	100
HAProxy	HTTP/3	1	689.34ms	0.00%	14.24	100
HAProxy	HTTP/3	10	948.17ms	0.00%	17.22	100
HAProxy	HTTP/3	50	3.45s	8.00%	20.87	100
HAProxy	HTTP/3	100	4.53s	0.00%	0.88	100

The surprisingly high throughput for Envoy is explained by the substantial amount of failed requests (which was consistent over multiple runs). I did not try increasing concurrent clients, as I assumed that would further degrade the results (given more clients * more streams).

It looks like all solutions deal poorly with an increasing amount of max concurrent streams. Nginx seems to be the most reliable in terms of throughput and TTFB in this scenario. Given, I'm using different backends (i.e. httpd for nginx and simplehttp2server for both envoy/haproxy), I'll try to assess if there is a backend limitation going on here.

backend = httpd (http/1.1)

Note this was the case for nginx from the start.

envoy

docker run --rm -it --network=host h2load-http3 -m1 -c100 --duration=600 --npn-list h3 https://envoy.io

finished in 600.04s, 885.30 req/s, 565.02KB/s
requests: 531182 total, 531282 started, 531182 done, 531175 succeeded, 7 failed, 0 errored, 0 timeout
status codes: 531179 2xx, 0 3xx, 0 4xx, 7 5xx
traffic: 331.07MB (347150046) total, 4.13MB (4329421) headers (space savings 94.26%), 320.66MB (336235073) data
UDP datagram: 1315737 sent, 1825199 received
                     min         max         mean         sd        +/- sd
time for request:    44.29ms    828.50ms    112.87ms     25.00ms    76.08%
time for connect:   104.72ms    512.28ms    401.39ms     81.85ms    56.00%
time to 1st byte:   414.16ms    683.57ms    536.35ms     83.36ms    56.00%
req/s           :       8.71        9.12        8.85        0.10    66.00%

docker run --rm -it --network=host h2load-http3 -m50 -c100 --duration=600 --npn-list h3 https://envoy.io

finished in 600.07s, 3736.18 req/s, 655.75KB/s
requests: 2241708 total, 2246708 started, 2241708 done, 326895 succeeded, 1914813 failed, 0 errored, 0 timeout
status codes: 326925 2xx, 0 3xx, 0 4xx, 1914819 5xx
traffic: 384.23MB (402890993) total, 17.24MB (18081472) headers (space savings 93.14%), 346.37MB (363195742) data
UDP datagram: 188617 sent, 469440 received
                     min         max         mean         sd        +/- sd
time for request:    45.12ms      11.28s       1.33s       1.60s    82.66%
time for connect:    74.50ms       2.75s    993.31ms    613.24ms    31.00%
time to 1st byte:   555.01ms       6.67s       2.56s       1.48s    69.00%
req/s           :      33.51       42.47       37.36        1.63    70.00%

haproxy

docker run --rm -it --network=host h2load-http3 -m1 -c100 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.06s, 1267.84 req/s, 984.31KB/s
requests: 760706 total, 760806 started, 760706 done, 760706 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 760706 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 576.75MB (604761270) total, 116.07MB (121712960) headers (space savings -3.90%), 456.32MB (478484074) data
UDP datagram: 786631 sent, 1587185 received
                     min         max         mean         sd        +/- sd
time for request:    40.77ms    513.23ms     58.31ms     14.26ms    84.27%
time for connect:    55.04ms       1.11s    394.40ms    448.80ms    71.00%
time to 1st byte:   175.01ms       1.20s    477.20ms    400.47ms    77.17%
req/s           :       0.00       17.66       13.71        6.15    80.00%

docker run --rm -it --network=host h2load-http3 -m50 -c100 --duration=600 --npn-list h3 https://haproxy.io

finished in 600.05s, 67.75 req/s, 52.60KB/s
requests: 40651 total, 45651 started, 40651 done, 40651 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 40651 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 30.82MB (32317545) total, 6.20MB (6504160) headers (space savings -3.90%), 24.38MB (25569479) data
UDP datagram: 7551 sent, 48259 received
                     min         max         mean         sd        +/- sd
time for request:    42.83ms      18.39s    346.95ms       1.64s    97.34%
time for connect:    56.36ms       1.07s    172.90ms    207.51ms    95.00%
time to 1st byte:   224.63ms      18.19s       3.28s       4.15s    90.22%
req/s           :       0.00      100.55        7.35       17.67    91.00%

Max concurrent streams conclusions

Switched everyone to have httpd (http/1.1) upstream, so the test scenario got absolutely even. No big improvements here, as Nginx still fares best. Envoy fails a lot of requests and this could be config related (might be worth looking into).

Service	Protocol	M	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients
Nginx	HTTP/3	1	326.88ms	0.00%	8.53	100
Nginx	HTTP/3	50	2.22s	3.00%	9.67	100
Envoy	HTTP/3	1	536.35ms	1.00%	8.85	100
Envoy	HTTP/3	50	2.56s	86.00%	37.36	100
HAProxy	HTTP/3	1	477.20ms	0.00%	13.71	100
HAProxy	HTTP/3	50	3.28s	0.00%	7.35	100

Preliminary GSO enabled results

Decided to verify if enabling GSO would affect performance values for maximum concurrent streams given the poor results. The idea came from this blog post.

For now, I only tried nginx as it was the best performer (also the most comparable to itself as I only used one backend for it in these benchmarks). See the posts above.

I enabled GSO with ethtool -K eth2 gso on. Also changed nginx config as shown here.

nginx with `rmem` tweaks

I ran this: sudo sysctl -w net.core.rmem_max=26214400 && sudo sysctl -w net.core.rmem_max=26214400. This way, I could test both changes and see if they provided a better cumulative gain.

docker run --rm -it --network=host h2load-http3 -m1 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.08s, 166.53 req/s, 131.01KB/s
requests: 99921 total, 100000 started, 99921 done, 99921 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 76.76MB (80491700) total, 24.13MB (25300000) headers (space savings 27.92%), 51.78MB (54300000) data
UDP datagram: 224465 sent, 300059 received
                     min         max         mean         sd        +/- sd
time for request:    64.78ms       3.44s    123.58ms     53.83ms    71.65%
time for connect:    52.47ms    254.05ms    153.00ms     59.17ms    57.00%
time to 1st byte:   254.18ms    417.71ms    331.06ms     45.03ms    53.00%
req/s           :       7.87        8.50        8.08        0.11    73.00%

docker run --rm -it --network=host h2load-http3 -m10 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.04s, 166.66 req/s, 130.98KB/s
requests: 99999 total, 100135 started, 99999 done, 99968 succeeded, 31 failed, 3 errored, 0 timeout
status codes: 99972 2xx, 0 3xx, 0 4xx, 28 5xx
traffic: 76.75MB (80474425) total, 24.12MB (25294428) headers (space savings 27.92%), 51.77MB (54289216) data
UDP datagram: 228480 sent, 266165 received
                     min         max         mean         sd        +/- sd
time for request:    43.38ms      61.29s       1.11s       1.31s    97.68%
time for connect:    51.56ms    249.94ms    151.07ms     58.75ms    57.00%
time to 1st byte:   267.12ms       4.37s       1.07s    869.09ms    85.00%
req/s           :       8.68        9.57        8.96        0.16    71.00%

docker run --rm -it --network=host h2load-http3 -m50 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.08s, 116.19 req/s, 90.20KB/s
requests: 70143 total, 71930 started, 70143 done, 68529 succeeded, 1614 failed, 432 errored, 0 timeout
status codes: 68529 2xx, 0 3xx, 0 4xx, 1182 5xx
traffic: 52.85MB (55418715) total, 16.60MB (17401665) headers (space savings 27.98%), 35.67MB (37401227) data
UDP datagram: 142350 sent, 160535 received
                     min         max         mean         sd        +/- sd
time for request:    41.36ms      81.46s       3.07s      10.53s    96.20%
time for connect:    55.24ms    254.08ms    153.21ms     59.11ms    58.00%
time to 1st byte:   265.83ms      26.89s       2.17s       2.89s    98.85%
req/s           :       0.00       16.94       10.06        5.44    75.00%

docker run --rm -it --network=host h2load-http3 -m100 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.08s, 59.65 req/s, 46.47KB/s
requests: 37177 total, 44882 started, 37177 done, 35352 succeeded, 1825 failed, 1389 errored, 0 timeout
status codes: 35352 2xx, 0 3xx, 0 4xx, 436 5xx
traffic: 27.23MB (28548762) total, 8.55MB (8967600) headers (space savings 27.96%), 18.38MB (19268936) data
UDP datagram: 66069 sent, 75999 received
                     min         max         mean         sd        +/- sd
time for request:    42.62ms      68.27s       3.06s       8.27s    94.13%
time for connect:    53.07ms    252.59ms    151.83ms     59.19ms    58.00%
time to 1st byte:   266.24ms      15.73s       1.80s       1.89s    98.36%
req/s           :       0.00       63.25       10.14       12.60    85.00%

nginx without the `rmem` tweaks

Just to make sure tweaking both GSO and rmem produces a better outcome than just the latter.

docker run --rm -it --network=host h2load-http3 -m100 -c100 --duration=600 --npn-list h3 https://nginx.io

finished in 600.09s, 69.50 req/s, 53.15KB/s
requests: 43365 total, 50839 started, 43365 done, 40140 succeeded, 3225 failed, 1665 errored, 0 timeout
status codes: 40140 2xx, 0 3xx, 0 4xx, 1560 5xx
traffic: 31.15MB (32657987) total, 9.77MB (10239660) headers (space savings 28.06%), 21.03MB (22052790) data
UDP datagram: 81077 sent, 91223 received
                     min         max         mean         sd        +/- sd
time for request:    42.66ms      66.22s       4.03s      11.86s    94.39%
time for connect:    56.00ms       7.07s       1.45s       1.62s    73.00%
time to 1st byte:   154.81ms       8.09s       2.01s       1.40s    85.45%
req/s           :       0.00       39.86        8.70       10.17    84.00%

Didn't explore this further as this result seemed to indicate worse performance.

Conclusions

Service	Protocol	M	TTFB Mean	Failure Rate	Reqs/S	Concurrent Clients	Tweaks
Nginx	HTTP/3	1	331.06ms	0.00%	8.08	100	GSO+rmem
Nginx	HTTP/3	1	326.88ms	0.00%	8.53	100	rmem
Nginx	HTTP/3	10	1.07s	1.00%	8.96	100	GSO+rmem
Nginx	HTTP/3	10	1.25s	1.00%	9.39	100	rmem
Nginx	HTTP/3	50	2.17s	3.00%	10.16	100	GSO+rmem
Nginx	HTTP/3	50	2.22s	3.00%	9.67	100	rmem
Nginx	HTTP/3	100	1.71s	6.00%	9.31	100	rmem
Nginx	HTTP/3	100	1.80s	5.00%	10.14	100	GSO+rmem
Nginx	HTTP/3	100	2.01s	8.00%	8.70	100	GSO

Results seem better for GSO+rmem, but the differences aren't enough to exclude sampling error. I find this inconclusive, but if I had to choose I would take both tweaks.

filecoin-saturn / L1-node

implement http3 #42

Update:

Update 2

Update 3:

TLDR:

Http3 vs Http2:

Caddy vs Nginx:

Next Steps:

Update 4:

HAProxy h3 vs h2

docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h3 https://haproxy.io

docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h2 https://haproxy.io

Envoy h3 vs h2

docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h3 https://envoy.io

docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h2 https://envoy.io

h3 net.core.rmem_max and net.core.rmem_max tweaks

Envoy h3

HAProxy h3

Benchmarks

Envoy h3

HAProxy h3

Conclusions

Nginx h3 (without rmem changes)

Nginx h3 (with rmem changes)

Conclusions & results revisited

m=10

nginx

envoy

haproxy

m=50

nginx

envoy

haproxy

m=100

nginx

envoy

haproxy

Preliminary max concurrent streams conclusions

backend = httpd (http/1.1)

envoy

haproxy

Max concurrent streams conclusions

Preliminary GSO enabled results

nginx with rmem tweaks

nginx without the rmem tweaks

Conclusions

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h3 https://haproxy.io`

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h2 https://haproxy.io`

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h3 https://envoy.io`

`docker run --rm -it --network=host h2load-http3 -n 10000 -c 100 -m 10 --npn-list h2 https://envoy.io`

h3 `net.core.rmem_max` and `net.core.rmem_max` tweaks

Nginx h3 (without `rmem` changes)

Nginx h3 (with `rmem` changes)

nginx with `rmem` tweaks

nginx without the `rmem` tweaks