Tantalor93 / dnspyre

CLI tool for a high QPS DNS benchmark
https://tantalor93.github.io/dnspyre/
MIT License
84 stars 10 forks source link

Keepalive on/off ? #242

Open PenelopeFudd opened 2 months ago

PenelopeFudd commented 2 months ago

Describe the feature When running a TCP-based test (TCP/TLS/DoH/etc), it's hard to tell if connections are being reused or not. Could you add a flag to enable/disable connection reuse (aka keepalive)?

Why do you need this feature I'm trying to get our DoH server to handle 200,000 QPS with <100ms response time for separate users. If the dnspyre clients are using keepalive, then the results may differ from what we'll see in production.

Tantalor93 commented 2 months ago

that makes sense! There is kinda support for this for plain DNS over TCP or DNS over TLS Benchmarks by using --query-per-conn flag, you can specify --query-per-conn=1 and therefore force each worker to generate a new connection after every request

but for DoH and DoQ there is no option to disable built-in keep-alives of the client, I'll think about this!

PenelopeFudd commented 2 months ago

For the service we're setting up, each client will typically make one DNS request a minute or so. For this, I'd normally set -c to number of clients and -n to 1, but I really prefer the -d duration option, since I'm trying to find out how many clients per second over a longer period. Otherwise, there's ramp-up time and the long tail as connections end, which prevent the pipeline from being "full".

I do know that the first request of a TLS-based session is going to be slow because of the certificate processing that needs to be done, and that's going to happen for every client. At that point, using keepalive is almost cheating. :-)

PenelopeFudd commented 2 months ago

I'd guess the shortest path would be to make --query-per-conn=1 work for DoH and DoQ. It's already being parsed from the command line and describes what we want to do; all that's left is doing it.

PenelopeFudd commented 1 month ago

Dang, I just discovered that -c doesn't do what I thought it did with DoH.

dnspyre -n 5 -c 10 --server https://pdnsdist.example.org/dns-query -t TXT hello.doh-test.com --doh-protocol=2

I expected this to start 10 separate connections to the server (-c 10), which would involve a TLS handshake for each. Wireshark shows just one TCP connection, and just one TLS handshake. 😞 This blows all of my performance testing out of the water.

Tantalor93 commented 1 month ago

Dang, I just discovered that -c doesn't do what I thought it did with DoH.

dnspyre -n 5 -c 10 --server https://pdnsdist.example.org/dns-query -t TXT hello.doh-test.com --doh-protocol=2

I expected this to start 10 separate connections to the server (-c 10), which would involve a TLS handshake for each. Wireshark shows just one TCP connection, and just one TLS handshake. 😞 This blows all of my performance testing out of the water.

yea, that is default behaviour for HTTP/2 (also http/3 and DoQ). Single connection and requests are multiplexed on top of it, but I am now working on a change that will introduce flag, that will basically make each worker act as absolutely separate worker with separate connections

PenelopeFudd commented 1 month ago

When do you think that might be ready? Asking for a friend. :-)

Tantalor93 commented 1 month ago

I plan to release the feature in the next two weeks. Need to do some refactorings first, think about API, and test this properly. 🙂
In the meantime, you can use --doh-protocol 1.1 (that is also the default for DoH benchmarks) and that will make each worker establish a persistent connection, though each query done by the worker will most likely reuse the connection established for a first query done by the worker

PenelopeFudd commented 1 month ago

Ok, thanks. Do you know how much faster HTTP/2 might be than HTTP/1.1 in the same situation? I'm inclined to believe that they'd be about equal, but that's based on precisely no evidence.

PenelopeFudd commented 1 month ago

I'm having dreadful problems with HTTP/1.1; I run it, it works; I run it again, it fails. I suspect it's running out of ports because they're all in TIME_WAIT.

Tantalor93 commented 1 month ago

but I am now working on a change that will introduce flag, that will basically make each worker act as absolutely separate worker with separate connections

I am now drafting this change as part of https://github.com/Tantalor93/dnspyre/pull/252, by providing --separate-worker-connections it will be possible to force each concurrent worker to have separate connections to the DNS server (meaning separate TLS handshakes and everything that comes with it)

but for DoH and DoQ there is no option to disable built-in keep-alives of the client, I'll think about this!

regarding an option to disable keep-alives, I think such option would make sense only for DoH over HTTP/1.1.

For DoH over HTTP/2 or HTTP/3 and DoQ, it does not make sense as those are built around maintaining persistent connections.

Do you know how much faster HTTP/2 might be than HTTP/1.1 in the same situation

usually from my experience and various benchmarks I expect the DoH over HTTP/2 to be much more performant even two times than DoH over HTTP/1.1 🙂

PenelopeFudd commented 1 month ago

Oh, I see my problem: -c is the number of concurrent queries (--concurrency), not number of simulated clients. In some situations they mean the same thing, but not in others. 😖