bearcove / loona

HTTP 1+2 in Rust, with io_uring & ktls
https://docs.rs/loona
Apache License 2.0
361 stars 13 forks source link

fix: Set TCP_NODELAY on io_uring path #209

Closed fasterthanlime closed 3 months ago

fasterthanlime commented 3 months ago

Not ran in CI yet.

codspeed-hq[bot] commented 3 months ago

CodSpeed Performance Report

Merging #209 will not alter performance

Comparing first-h2load-benchmarks (32fe186) with main (1162550)

Summary

✅ 10 untouched benchmarks

fasterthanlime commented 3 months ago

So, on macOS, where the io-uring codepath is not used, for a tiny HTTP/2 response, we're somewhat competitive with hyper.

On Linux however, where the io-uring codepath is used, we have ~13µs mean request time for hyper, and ~44.5ms mean request time for us.

This isn't entirely surprising, given that the whole io-uring codepath is really naive right now: fluke-io-uring-async is adapted from https://github.com/thomasbarrett/io-uring-async which was mentioned in a tokio-uring issue.

The next steps are looking at flamegraph / wielding perf to find out where that time is spent. Exciting!

fasterthanlime commented 3 months ago

The next steps are looking at flamegraph / wielding perf to find out where that time is spent. Exciting!

Time was spent waiting, and, well, it was waiting for Nagle, of course (TCP_NODELAY) — 40-50ms delay should've been a strong enough hint, but I guess I've been busy thinking about thread-locals instead.

The mean request times are much closer now: 13µs for hyper, 26µs for us. I have several ideas on how to improve this benchmark which, like all benchmarks, tells lies.

fasterthanlime commented 3 months ago

Woops that PR did not actually include set_nodelay