compio-rs / compio

A thread-per-core Rust runtime with IOCP/io_uring/polling.
MIT License
420 stars 37 forks source link

feat: QUIC #282

Closed AsakuraMizu closed 1 month ago

AsakuraMizu commented 2 months ago

This PR introduces QUIC (and HTTP/3) support via quinn-proto.

Features ported from quinn-udp

Features ported from quinn

Almost all features except:

Features ported from h3-quinn

Performance

quinn-udp uses recvmmsg syscall on supported platforms to improve performance. I did not port this feature as io_uring does not have corresponding opcode. As a result, our implementation is slightly slower than quinn when using io-uring driver (and of course much slower if using polling driver). Using multishot opcode may improve performance, but it's stilling pending (#104). On platforms that does not support recvmmsg it still has some distance, and unfortunately benchmark program is not running on Windows. I think there may be some issues in my benchmark code.

AsakuraMizu commented 2 months ago

Some blocking issues:

  1. Dual-stack socket?
    Fact: IPV6_V6ONLY defaults to true only on Windows, and dual-stack is not supported by OpenBSD and FreeBSD. If we provide methods for creating dual-stack sockets, should we also provide methods for v6-only sockets?
    socket2 allows calling set_only_v6 after new and before bind, but our design does not welcome this.
    I prefer to make dual-stack the default on all platforms, or make it configurable via a feature flag.
  2. I/O functions on RecvStream/SendStream are not satisfactory. Specifically, SendStream::write will inevitably perform a copy, and the zero-copy method write_chunks requires the use of bytes. This is not friendly to our AsyncWrite trait, and I'm considering removing the impl. Similar case for RecvStream.
Berrysoft commented 2 months ago

Check the benchmark, please. It stack overflows on my Windows machine.

AsakuraMizu commented 2 months ago

Check the benchmark, please. It stack overflows on my Windows machine.

It turns out that the data is just too big. But while adjusting the size I found that different sizes have a big impact on the performance... (I mean, the gap between our implementation and quinn) For example if large_data has a length of 1024 * 64=65536 compio-quic is 1.4x faster than quinn on echo-large-data-1-stream. But if large_data has a length of 1024 * 16=16384 then quinn is 0.7x faster than us on the same testcase. (These data come from runs on Windows) Maybe the benchmarks still have a lot to improve.