Closed fasterthanlime closed 3 months ago
Comparing first-h2load-benchmarks
(32fe186) with main
(1162550)
✅ 10
untouched benchmarks
So, on macOS, where the io-uring codepath is not used, for a tiny HTTP/2 response, we're somewhat competitive with hyper.
On Linux however, where the io-uring codepath is used, we have ~13µs mean request time for hyper, and ~44.5ms mean request time for us.
This isn't entirely surprising, given that the whole io-uring codepath is really naive right now: fluke-io-uring-async
is adapted from https://github.com/thomasbarrett/io-uring-async which was mentioned in a tokio-uring issue.
The next steps are looking at flamegraph / wielding perf to find out where that time is spent. Exciting!
The next steps are looking at flamegraph / wielding perf to find out where that time is spent. Exciting!
Time was spent waiting, and, well, it was waiting for Nagle, of course (TCP_NODELAY) — 40-50ms delay should've been a strong enough hint, but I guess I've been busy thinking about thread-locals instead.
The mean request times are much closer now: 13µs for hyper, 26µs for us. I have several ideas on how to improve this benchmark which, like all benchmarks, tells lies.
Woops that PR did not actually include set_nodelay
Not ran in CI yet.