Open libor-peltan-cznic opened 1 month ago
I think the most important timeout is the user timeout - i.e. the amount of time since the client starts processing a query, until the time it receives the answer.
But I agree other timeouts might be useful as well. DNS Shotgun also uses:
handshake_timeout
: the amount of time after which an attempt to finish a handshake is abandonedidle_timeout
: the amount of time for which the established connection remains open even if there are no queries sent (applies to stateful protocols)Please note that the handshake/idle timeout are parameters which might differ for each DNS Metrics
object. For example, one subid
sender might use idle_timeout
of 0 to simulate clients which aggressively close connections as soon as they get an answer, while other subid
sender could use a value like 10 to simulate more well-behaved clients.
Putting these into a header might be too limiting. Perhaps the header could contain definitions of various timeout configurations which could then be referenced in the DNS Metrics
object?
In practice, any way we measure any kind of latency, we can equivalently impose timeouts (i.e. ceiling of latency after which there will be any kind of failure).
What i think we could do, is to remove this header/timeout for now, first design a robust system of measured latencies, and only after that, think of a system for declaring timeouts. Anyone better idea?
Generally I agree but I don't have a good idea how to express it without endless repetition.
My view: The "timeout" value is mostly used for interpreting the data. If I have data like this:
timeout
value set to 1000 tells me that it was either packet drop or a response slower than 1000 time unitsTimeouts are can of worms because it's also debatable when you start the timer etc. E.g. if load simulator like Shotgun generates "traffic like from real users" it probably wants to measure end-to-end latency for individual queries, i.e. start the timer when "user wanted to send the query" and stop it only after receiving a response. In this case the latency/timeout would include potential TCP/TLS/DoH session setup etc.
Another tool might want to measure DNS request/response latency and exclude connection setup from that. :exploding_head:
The "timeout" value is mostly used for interpreting the data.
It shouldn't be, at least it's not the case in shotgun - timeout-ed queries should be accounted for by being present in the very last latency bucket.
What i think we could do, is to remove this header/timeout for now, first design a robust system of measured latencies, and only after that, think of a system for declaring timeouts.
Agreed. It's better to have no way of representing timeouts for now than to have a vague unclear value in a place where it might not belong.
Timeouts are can of worms because it's also debatable when you start the timer etc.
Right, the responses in latency buckets might mean different things for different tools. I can't think of a way around that other than to have some optional metadata field in the header which would specify what the query latencies actually represent.
what does this actually mean? Unless UDP, there are various timeouts that can be used and they usually differ, e.g. handshake timeout, IO timeout, idle timeout....
I suggest to either remove this, or to improve this to accomodate various timeout information.