Open dignifiedquire opened 1 year ago
Hmm that seems quite low. You can try using our netbench tool. I'm able to achieve almost 8Gbit/s on my machine:
$ cd netbench
$ cargo build --release
$ ./target/release/netbench-scenarios --request_response.response_size=10GiB
$ PORT=3000 SCENARIO=./target/netbench/request_response.json ./target/release/netbench-driver-s2n-quic-server
$ SERVER_0=localhost:3000 SCENARIO=./target/netbench/request_response.json ./target/release/netbench-driver-s2n-quic-client
0:00:01.000382 throughput: rx=983.06MBps tx=999Bps
0:00:02.001212 throughput: rx=930.97MBps tx=0Bps
0:00:03.002278 throughput: rx=931.40MBps tx=0Bps
0:00:04.003062 throughput: rx=928.28MBps tx=0Bps
0:00:05.004160 throughput: rx=930.81MBps tx=0Bps
0:00:06.005241 throughput: rx=928.78MBps tx=0Bps
0:00:07.006382 throughput: rx=930.50MBps tx=0Bps
0:00:08.007209 throughput: rx=929.02MBps tx=0Bps
0:00:09.008089 throughput: rx=931.42MBps tx=0Bps
0:00:10.009223 throughput: rx=930.06MBps tx=0Bps
0:00:11.010127 throughput: rx=933.53MBps tx=0Bps
One thing to note is that macOS doesn't support the same optimized UDP APIs as Linux (namely sendmmsg and GSO) so that will generally be lower as well.
However, it's important to keep in mind that cleartext vs encrypted traffic is not really comparable; the cost of encryption is quite high on raw throughput. For example, if I compare TCP to TCP/TLS, you see a similar effect.
$ PORT=3000 SCENARIO=./target/netbench/request_response.json ./target/release/netbench-driver-s2n-tls-server
$ SERVER_0=localhost:3000 SCENARIO=./target/netbench/request_response.json ./target/release/netbench-driver-s2n-tls-client
0:00:01.000698 throughput: rx=1.33GBps tx=999Bps
0:00:02.001713 throughput: rx=1.31GBps tx=0Bps
0:00:03.002700 throughput: rx=1.31GBps tx=0Bps
0:00:04.003702 throughput: rx=1.31GBps tx=0Bps
0:00:05.004695 throughput: rx=1.31GBps tx=0Bps
0:00:06.005701 throughput: rx=1.31GBps tx=0Bps
0:00:07.006694 throughput: rx=1.31GBps tx=0Bps
0:00:08.007694 throughput: rx=1.31GBps tx=0Bps
$ PORT=3000 SCENARIO=./target/netbench/request_response.json ./target/release/netbench-driver-tcp-server
$ SERVER_0=localhost:3000 SCENARIO=./target/netbench/request_response.json ./target/release/netbench-driver-tcp-client
0:00:01.000366 throughput: rx=4.79GBps tx=999Bps
0:00:02.001359 throughput: rx=5.13GBps tx=0Bps
0:00:03.002367 throughput: rx=4.99GBps tx=0Bps
0:00:04.003366 throughput: rx=5.07GBps tx=0Bps
0:00:05.004367 throughput: rx=4.93GBps tx=0Bps
0:00:06.005389 throughput: rx=5.07GBps tx=0Bps
We do have some optimizations planned for this year to close the gap between s2n-quic and TCP/TLS, so that should improve.
Thanks for the quick response.
This is what I get running on my linux machine
0:00:09.009077 throughput: rx=588.97MBps tx=0Bps
0:00:10.010063 throughput: rx=587.80MBps tx=0Bps
0:00:11.011046 throughput: rx=588.30MBps tx=0Bps
0:00:12.012046 throughput: rx=586.50MBps tx=0Bps
0:00:13.013223 throughput: rx=587.86MBps tx=0Bps
0:00:14.013751 throughput: rx=590.16MBps tx=0Bps
0:00:15.015065 throughput: rx=589.67MBps tx=0Bps
0:00:16.015823 throughput: rx=589.25MBps tx=0Bps
0:00:17.016755 throughput: rx=589.14MBps tx=0Bps
0:00:18.017976 throughput: rx=587.93MBps tx=0Bps
0:00:01.000043 throughput: rx=1.09GBps tx=999Bps
0:00:02.001035 throughput: rx=1.14GBps tx=0Bps
0:00:03.002024 throughput: rx=1.14GBps tx=0Bps
0:00:04.003027 throughput: rx=1.14GBps tx=0Bps
0:00:05.004023 throughput: rx=1.14GBps tx=0Bps
0:00:06.005036 throughput: rx=1.14GBps tx=0Bps
0:00:07.006032 throughput: rx=1.14GBps tx=0Bps
0:00:08.007030 throughput: rx=1.14GBps tx=0Bps
0:00:09.008024 throughput: rx=1.14GBps tx=0Bps
0:00:01.000624 throughput: rx=3.22GBps tx=999Bps
0:00:02.001617 throughput: rx=3.31GBps tx=0Bps
0:00:03.002623 throughput: rx=3.32GBps tx=0Bps
On my mac I am getting
Error: "The connection was closed because the handshake took longer than the max handshake duration of 10s"
on the client side when running the quic one
Our macOS bindings have some issues with dual-stack IP sockets. For some reason the socket isn't able to receive responses. This is noted in the netbench readme:
https://github.com/aws/s2n-quic/tree/main/netbench/netbench-driver#running-driver-tests
Note: if the netbench driver is being run on a mac, set the local IP on the client driver to 0.0.0.0 as follows: --local-ip 0.0.0.0
We have a pending issue to investigate this and fix it.
Are your plans to run over localhost in production? Or is this just for getting an idea of performance? Generally, UDP loopback is more expensive than sending/receiving on an actual NIC, especially with GSO support.
Are your plans to run over localhost in production
My use for localhost is two fold (1) I usually use it as a default benchmark to test overhead of other things, when the network is "removed" (2) I was experimenting with quic as an RPC layer, and so in that case it would be used both on localhost and on a local network.
Our macOS bindings have some issues with dual-stack IP sockets. For some reason the socket isn't able to receive responses. This is noted in the netbench readme:
Thanks, I missed that
I have a very-much-WIP (only works on Linux ATM) branch that is able to push 16Gbps over localhost on my machine, which doubles what we do today and actually exceeds the perf of TLS.
https://github.com/aws/s2n-quic/tree/camshaft/multi-socket
I'm hoping to get all of this cleaned up and merged in the coming weeks.
@camshaft very cool, any high level comment on what you did to make this happen?
There's a few things in there
I have been testing s2n-quic on localhost, and am seeing a speed of around 1.5 Gbit/s, running with
iperf3
on the same machine with UDP, I am seeing more than6Gbit/s
. I was wondering if there is something I could adjust in the configuration to improve this, or if this is a known issue. Some additional infoecho
example