Open jelu opened 1 month ago
@libor-peltan-cznic oh, this is a lot of properties, we kinda need to start somewhere, so which of these already exists today?
Note that "resumptions_try" and "resumptions" is a mess that oughta been deleted. Please remove.
Well, some of the hadnshake details are only specificly needed with DoT, so you might postpone those.
In any case, vast majority should be optional, so we can start with anything. "Which exist today" is a question specific to any tool you have in mind. Probably "queries" and "replies" are first :)
@libor-peltan-cznic What of this does your tool already generate?
Well, our tool generates any JSON output only in case of a development branch, which might be abandoned anyway: https://gitlab.nic.cz/knot/knot-dns/-/commits/xdpgun_json
So far it looks like this, but I repeat that it might be abandoned or altered, so I don't see any point in following it:
{
"type": "header",
"runid": 1716468905003400,
"schema_version": 20221207,
"merged": true,
"time_units_per_sec": 1000000,
"generator": "kxdpgun",
"generator_params": [
"-t",
"10",
"-b",
"2",
"-i",
"/home/peltan/queries.txt",
"-Q",
"7",
"-j",
"--tcp",
"dns.google"
],
"generator_version": "3.4.dev0+1716308098.066968356",
"stats_interval": 10000000,
"interface": "wlo1",
"threads": 1
}
{
"type": "stats_sum",
"runid": 1716468905003400,
"since": 1716468893986294,
"until": 1716468905003396,
"requests": 72,
"answers": 72,
"responses": {
"NOERROR": 3,
"NXDOMAIN": 69
}
}
https://github.com/DNS-OARC/flamethrower/issues/99#issuecomment-2107862753 by @nicki-krizek
Feedback on the proposed format:
merged
means that the data has been added up from all senders/threads. I don't think this should be a property of the header, but rather the "data" entry (i.e. stats_periodic
). It might also be a special reserved value for the threadid
instead.conn_active
: number of active connectionsconn_handshakes
: number of handshakes performedconn_handshakes_failed
: number of failed handshakesconn_resumed
: number of connection resumed with TLS resumptionconn_quic_0rtt_loaded
: number of connection for which 0RTT was accepted on a QUIC protocol levelquic_0rtt_sent
: number of requests for which 0RTT data was used over QUICquic_0rtt_answered
: number of answers which were received for requests sent as 0RTT data over QUIC
Header:
time_units_per_sec ... remove! Always use microseconds. timestamp_start ... usec-precision timestamp of measurement start (can be zero, or unixtime, or anything random)
Stats:
since ... usec-precision timestamp of periodic stat begin until ... dtto period_number ... sequential number of this periodic stats
Event-based counters:
conn_init ... sent SYN packets (TCP) or Initials (QUIC, where it's equal to session_init + resumption_init) conn_established ... TCP handshakes completed (in DoQ, this is equal to session_established + resumption_established) session_init ... attempted initiating TLS session session_established ... TLS/QUIC handshakes completed resumption_init ... attempted resumpting TLS session resumption_established ... TLS/QUIC handshakes completed by resumption resumption_fallbacks ... attempt to session resumption denied by counterpart, continue to session_init (NOTE: in case of DoQ, this can also lead to "lost" DNS queries) resumptions_try ... TLS/QUIC session resumptions attempted resumptions ... TLS/QUIC session resumptions successful (NOTE: this is ALSO counted as "established") [TODO right now we are assuming that 0-RTT DNS query is sent on every successful resumption on DoQ and never on DoT]
queries ... sent DNS queries replies ... received DNS replies close_sent ... initiated graceful connection close (FIN sent in TCP) closed ... gracefully closed connections reset_send ... abruptly closed by us reset_recv ... abruptly closed connections by counterpart (only those RSTs that belong to existing conns?) unexpected_DNS ... received unexpected DNS packet unexpected_raw ... received unexpected packet that is NOT(!) DNS (e.g. TCP SYN+ACK without matching SYN), any ballast on wire discarded_DNS ... DNS queries from given input that were not sent (e.g. excessive size of DNS query on input) discarded_raw ... any packet that was not able to send (BESIDES! discarded_DNS)
State-based periodic timers (also in the case when measurement canceled prematurely, this will contain non-zeros at the end):
establishing ... handshakes pending connected ... established connections alive ongoing ... DNS queries waiting for reply (not timeouted)
Size counters (in bytes, cumulative, not average!):
DNS_query_size ... DNS payload ONLY (even without 2-byte prefix in TCP) DNS_reply_size wire_size_out wire_size_in
Latency timers (in usecs, avg/min/max/stddev/buckets):
initial_latency ... within handshake, from initial packet sent to first reaction recv handshake_duration ... from initial packet to handshake (incl. TLS/QUIC) completed (possibly multiple round-trips) response_latency ... from DNS query sent to DNS response recv user_latency ... from initial packet sent to DNS response recv (including handshake ONLY if it was needed) connection_duration ... from initial packet sent to connection closed (or timeouted, failed, abandoned...)
Originally posted by @libor-peltan-cznic in https://github.com/DNS-OARC/dns-metrics/issues/2#issuecomment-2125036563