victorstewart commented 3 years ago

Thanks for the talk it was extremely eye opening!

I don't have access to a testbed, so I was wondering what the timeline is for your plans to test Quiche?

Winters123 commented 3 years ago

Thanks for your interest in this work! I'm happy that it can help people understand QUIC better from a performance perspective.

We are currently testing on quiche (cloudflare's) and the results shall be obtained pretty soon. For google's quiche, we'll have to upgrade our OS before we can test on it. (It will take some time as we have other projects rely on the current OS).

Is there any dimension of quiche that you are particularly interested in?

victorstewart commented 3 years ago

@Winters123

Actually yes! How Quiche (and the others) perform under at least 50,000 but ideally up to 100,000 mostly-quiet simultaneous connections. Maybe in 1000 increments. I'd need at least 50,000 before I could move production workloads onto Quic. (Talking a single threaded instance running on one logical core.) I manage tons of long lived mostly idle connections in order to enable the realtime delivery of data from server -> client.

So essentially memory usage, throughout and latency as the connection count grows.

And also throughput as packet loss and reordering rates rise as you analyzed in the talk are crucial as well of course.

I just began a discussion with the picoquic maintainer about this simultaneous connections topic after seeing pico was the only implementation to really standup under distress. I thought maybe a static data rate spread over ever more connections would best isolate the dynamics of many connections.

He thought these would be the key stress points:

I am not worried about the extra load for attributing a packet to a connection -- this is done with a hash table, and size 50000 would not be too much of an issue. I am also not too worried about the network effects, since we use a single UDP socket for all connections, although there might be something happening in the kernel implementation of UDP and of the ARP caches. I am worried about memory size, since that was not optimized for. I am also worried about other details, such as the shutting down of connections if their are idle for too long, or the extra load of maintenance traffic to avoid the idle mechanism. That tells me that any test should not just create batches of connections, but also simulate what happens when these connections remain idle for at least 10 minutes.

1022

Again thanks so much for this work. I'd foolishly assumed all these implementations were stable and equivalently so, but I guess we're still in the early days! Almost dived in head first lol.

Winters123 commented 3 years ago

So essentially memory usage, throughput and latency as the connection count grows. And also throughput as packet loss and reordering rates rise as you analyzed in the talk are crucial as well of course.

Except for the memory usage, the current test we are doing should cover most of the aspects you are interested in. To be honest, I wasn't preparing for a >50000 scale connections test. But will definitely add it since you are talking about a real scenario. But as what the implementer of picoquic mentioned, maybe simply create batches of connections can't give us enough information.

I feel like there are two potential ways we can consider for the multi-connection test:

we still use connection batches but create a "gap" for each connection to mimic the "idle" period. (don't care about the on-top apps)
we modify the http3 server provided by quiche and use it as the server. We fire-up clients and use timer to control the idle period in the test script.

I think we can go both ways but apparently the 1st one would be simpler.

What do you think?

dtikhonov commented 3 years ago

I'd love to see how lsquic performs. I am willing to dedicate some time to this.

nibanks commented 3 years ago

If there is a sign up sheet being started, I'd like to add msquic to the list.

dtikhonov commented 3 years ago

Thanks for the talk it was extremely eye opening!

I'd also like to hear this talk -- do you know whether it is or will be available on YouTube (or some other public server)?

victorstewart commented 3 years ago

@Winters123

Assuming the real cost of many connections were in fact maintaining idle timers, sending keep-alive packets (idk what they're called in QUIC lol) and shutting down idle connections, I think all we'd have to do is open a bunch of connections... set random idle timeouts... let them keep alive... let them idle out and close...and then measure the throughput impact?

All my clients are mobile so I dynamically update Keep-Alive timeouts for TCP per client based on battery life, anywhere from 2 to 60 minutes generally. But I think we can isolate all the dynamics by staying <= 10 minutes, maybe even shorter.

And we'll need to make sure all implementations are using the same interval for sending keep alive packets, since that's the maintenance volume driver.

Timer efficiency boils down to using the best timer design which I believe is currently a hierarchal timer wheel. Which becomes O(1) for inserts and deletes (etc?). So in theory I don't see why timers would ever need to be a bottleneck.

And for maintenance traffic to keep idle connections open... maybe the frequency for UDP has to be much higher than TCP because of NATs and middleboxes idling out UDP connections faster than they would TCP ones?

And I think best to avoid http3 and stick with a QUIC toy server, which for Quiche in particular is here https://github.com/cloudflare/quiche/blob/6d7bab4468a59ff1e48afede6cab1d696c363026/examples/server.c

victorstewart commented 3 years ago

@dtikhonov https://www.youtube.com/watch?v=DS0-NWIL3Js

LPardue commented 3 years ago

Since we're talking Cloudflare quiche, I'd like to clarify that example clients and servers (client, server, http3-client, http3-server, quiche-client, and quiche-server) are not designed to maximise performance. So the conclusions that can be drawn from testing them are going to be limited. The nginx quiche HTTP/3 module and integration are the best option for performance testing.

larseggert commented 3 years ago

The above is likely true for most implementations at the moment. (quant is a bit different, because it tries to be as fast as it can, but also isn't meant to be used in production ever, so many corners were cut.)

LPardue commented 3 years ago

Yep. As @rmarx EPIQ paper pointed out, implementations are diverse and constantly evolving. It would be super good for reproducibility to list all characteristics of the implementations under test e.g. the library, the driving application, compiler versions and options, software configuration options, system configuration etc. Having a testbed project makes that implicit, but it would be fair to include all of these alongside presentation of results.

nibanks commented 3 years ago

FWIW, I've been working on writing up a simple, general-purpose QUIC testing protocol (not HTTP/3 based). I can officially write it up and share out a spec if others would be interested, but simply it's something like:

ALPN is "perf"
First unidirectional stream from the client contains general configuration data (turn a given app feature on/off; keep alives, etc.)
The first uint64_t bytes of a stream (uni or bidi) indicate the response size requested by the client.
- For unidirectional streams, if non-zero response size, the server opens a new unidirectional stream for the response
- For bidirectional streams, the server sends any response on the same stream

With this general protocol, a generic server can be stood up to handle any number of possible performance tests, driven by the client:

Single connection upload or download tests
Request per second tests with different request/response sizes
Handshakes per second tests
... and more

IMO, it should be quite easy to create simple client and server implementations of this that effectively test the performance of different QUIC implementations. Any thoughts?

dtikhonov commented 3 years ago

@nibanks, on one hand, I want to say "I love it!" On the other hand, I wonder: is HTTP/3 really that much overhead that a separate protocol is required to test QUIC transport performance?

nibanks commented 3 years ago

In a word: yes.

A bit longer explanation: I don't own HTTP/3 and the Windows implementation is not open source, and is very tightly coupled with existing HTTP & Windows code. So I cannot have a dependency on that protocol for msquic repo.

nibanks commented 3 years ago

Additionally, I have multiple QUIC based protocols in progress (SMB for instance) that wouldn't benefit from an HTTP/3 layer perf test.

LPardue commented 3 years ago

There's benefits to both levels of testing. I like the idea of an "iperf for QUIC", I'm not sure how different the proposal is to how we use "HTTP/0.9 over QUIC" for interop testing.

I just wouldn't be surprised if the population of people that do performance engineering of HTTP/3 production stacks are not able to also divert effort into performance tuning of an application that is solely aimed towards synthetic benchmarking.

Either way these are great things to think and discuss.

nibanks commented 3 years ago

I can understand this might be an unwelcome burden for those who do own QUIC and HTTP/3 and HTTP/3 is their primary (only?) protocol. Though, in my experience so far with testing both QUIC by itself and QUIC with HTTP/3 code, there is a whole world of difference in the results and bottlenecks. It has been a worth while effort to do both, and I would recommend that to anyone working on HTTP/3.

LPardue commented 3 years ago

Agreed both are useful! I just think it helps to be upfront with the community that QUIC performance metrics don't directly infer real-world application mapping performance.

Your proposal for "perf" would be good to create a fair baseline to compare TCP+TLS to QUIC.

nibanks commented 3 years ago

I totally agree that a QUIC perf, by itself doesn't convey real-world metrics. That's always going to require the full-up scenario and HW configuration to be tested. My goal with this perf protocol would be to have "standard QUIC metrics" for a given implementation and allow for easy cross-implementation performance testing.

nibanks commented 3 years ago

FYI, it's a WIP, but I have started a doc for my proposed QUIC perf protocol here. Feel free to direct any specific questions there. I am trying to finish up a first draft today.

Winters123 commented 3 years ago

@dtikhonov @nibanks Thanks! We haven't test on lsquic and msquic but they are definitely in the TODO list. it would be good if lsquic and msquic provide options to record throughput from the client and request num from the server. Then we can try to obtain these values periodically with different configurations.

dtikhonov commented 3 years ago

Could you be a little more specific: how do you expect throughput be recorded? Is this something as simple as writing to the log file? If so, what format would you like?

Winters123 commented 3 years ago

@dtikhonov It would be good enough that the throughput can be recorded every sec into a log file. so is the request per sec. A format can be just like this:

establish_time: xxxMbps ...

Winters123 commented 3 years ago

Yep. As @rmarx EPIQ paper pointed out, implementations are diverse and constantly evolving. It would be super good for reproducibility to list all characteristics of the implementations under test e.g. the library, the driving application, compiler versions and options, software configuration options, system configuration etc. Having a testbed project makes that implicit, but it would be fair to include all of these alongside presentation of results.

I totally agree with it. We included some of the configuration settings in that paper but not very detailed enough. I think it's a good option that we cover as many parameters as possible (as long as they are potentially related to the performance matrix) in the next iteration.

dtikhonov commented 3 years ago

@Winters123, what does the "establish time" in establish_time: xxxMbps mean? Should it be a timestamp?

Also, do you expect the client always to open just one connection?

Winters123 commented 3 years ago

@nibanks this is a very cool idea and could be a good replacement for the test we did. Most of the time we spent in the test was reading the source code, add instrumentations and so on. It will save plenty of time if all the implementations can provide standardized APIs or options to record the performance matrix.

One specific test example I guess could be added is: the number of idle connections a server can maintain at a time.

This is mentioned here but I also noticed somewhere else that whether idle connections will be maintained by keep-alive is application-specific.

If this is a widely-used feature then adding it to the QUIC perf would be great. But I'm not sure if that's the case.

nibanks commented 3 years ago

@Winters123 feel free to send a PR for any changes you think should be made. I'd be happy to work with you on them. BTW, I just merged a PR into msquic to add support for the quic perf protocol in my spec.

Winters123 commented 3 years ago

@Winters123, what does the "establish time" in establish_time: xxxMbps mean? Should it be a timestamp?

Exactly, a timestamp will do (us/ms). It will represent the time from the handshake being finished to now

Also, do you expect the client always to open just one connection?

I think so. And the multi-connection test case is usually done by initializing multiple clients and connecting to the same server at the same time.

nibanks commented 3 years ago

IMO, all measurements should be from the client perspective. Additionally, you should always run the tests multiple times and take an mean or median of the results.

Winters123 commented 3 years ago

IMO, all measurements should be from the client perspective. Additionally, you should always run the tests multiple times and take an mean or median of the results.

I totally agree with the idea of using mean or median as the result and that's why we collect the throughput on a second basis.

But I guess there are cases where obtaining the values from the server would be better or simpler. For instance, if we care about num of idle connection a server can hold, then recording those numbers from the server would be a straightforward way.

nibanks commented 3 years ago

How exactly do you measure that number of connections? What do you expect to happen when that supposed limit (if it exists) is reached?

Winters123 commented 3 years ago

I don't have a very exact idea to do that at the moment.. But maybe enable the keep-alive feature from the client side, and count the total numbers of handshakes from the server would do?

I assume if the number of connections is very large the memory of the server could be possibly saturated by dealing with too many packets and states. But as I didn't do any test on it, so I'm just guessing ...

dtikhonov commented 3 years ago

if the number of connections is very large the memory of the server could be possibly saturated

This would be a good test.

nibanks commented 3 years ago

@Winters123 if you could spell out exactly how the client should execute the test here then we could make it a standard test, and I could add an implementation to msquic.

Winters123 commented 3 years ago

That's a great idea. I'll try to add it to the doc.

victorstewart commented 3 years ago

@nibanks @dtikhonov @Winters123

so at a large number of idle connections, yes memory usage.

but also latency for clients to receive responses and throughput cannibalization... because for every idle socket you need a timer to trigger every 30 seconds (because NATs and firewalls more aggressively terminate idle UDP then TCP) to send a keep alive packet.

so the more idle sockets the more of your potential throughout is consumed by maintenance.

a lot of this is unavoidable but design choices can play a significant part i assume.

but also you never know, something could spontaneously break with the connections state machine 🤷🏿‍♂️. nothings guaranteed until tested.

Winters123 / QUIC-measurement-kit

Testing Other Implementations #4

https://github.com/private-octopus/picoquic/issues/1022