header/timeout has none or little sense

libor-peltan-cznic commented 1 month ago

"timeout": {
                    "description": "The network timeout used in the generator, in fractional seconds",
                    "type": "number"
                }

what does this actually mean? Unless UDP, there are various timeouts that can be used and they usually differ, e.g. handshake timeout, IO timeout, idle timeout....

I suggest to either remove this, or to improve this to accomodate various timeout information.

nicki-krizek commented 1 month ago

I think the most important timeout is the user timeout - i.e. the amount of time since the client starts processing a query, until the time it receives the answer.

But I agree other timeouts might be useful as well. DNS Shotgun also uses:

handshake_timeout: the amount of time after which an attempt to finish a handshake is abandoned
idle_timeout: the amount of time for which the established connection remains open even if there are no queries sent (applies to stateful protocols)

Please note that the handshake/idle timeout are parameters which might differ for each DNS Metrics object. For example, one subid sender might use idle_timeout of 0 to simulate clients which aggressively close connections as soon as they get an answer, while other subid sender could use a value like 10 to simulate more well-behaved clients.

Putting these into a header might be too limiting. Perhaps the header could contain definitions of various timeout configurations which could then be referenced in the DNS Metrics object?

libor-peltan-cznic commented 1 month ago

In practice, any way we measure any kind of latency, we can equivalently impose timeouts (i.e. ceiling of latency after which there will be any kind of failure).

What i think we could do, is to remove this header/timeout for now, first design a robust system of measured latencies, and only after that, think of a system for declaring timeouts. Anyone better idea?

pspacek commented 1 month ago

Generally I agree but I don't have a good idea how to express it without endless repetition.

My view: The "timeout" value is mostly used for interpreting the data. If I have data like this:

requests sent = 1000
latency histogram
- under 50 ms = 990 responses
- under 100 ms = 9 responses
- one request is unaccounted for, and the timeout value set to 1000 tells me that it was either packet drop or a response slower than 1000 time units

Timeouts are can of worms because it's also debatable when you start the timer etc. E.g. if load simulator like Shotgun generates "traffic like from real users" it probably wants to measure end-to-end latency for individual queries, i.e. start the timer when "user wanted to send the query" and stop it only after receiving a response. In this case the latency/timeout would include potential TCP/TLS/DoH session setup etc.

Another tool might want to measure DNS request/response latency and exclude connection setup from that. :exploding_head:

nicki-krizek commented 1 month ago

The "timeout" value is mostly used for interpreting the data.

It shouldn't be, at least it's not the case in shotgun - timeout-ed queries should be accounted for by being present in the very last latency bucket.

What i think we could do, is to remove this header/timeout for now, first design a robust system of measured latencies, and only after that, think of a system for declaring timeouts.

Agreed. It's better to have no way of representing timeouts for now than to have a vague unclear value in a place where it might not belong.

Timeouts are can of worms because it's also debatable when you start the timer etc.

Right, the responses in latency buckets might mean different things for different tools. I can't think of a way around that other than to have some optional metadata field in the header which would specify what the query latencies actually represent.

DNS-OARC / dns-metrics

header/timeout has none or little sense #15