Open johnhtodd opened 3 months ago
AFAIK dnsdist jus passes the packet received from the resolver (including the embedded EDE if available) to the dnstap stream. The dnstap message itself has no EDE field. i.e. dnsdist does not do any processing wrt EDE. So it would be of interest to see the actual answer received by dnsdist and the corresponding dnstap message produced.
It could be the EDE 0 (Other) are already in the answer sent by the resolver. It's also interesting to see if there's any extra text associated with the EDE 0 code.
I am 100% sure that the queries I am generating/receiving are coming back with "; EDE: 22 (No Reachable Authority): (delegation publicbt.com)" as the result (see "dig" results.) Those exact queries are creating results that show up in Vector (my dnstap parser) with "info code:0" . Now that I think about it, this may be a bug in Vector as it is suspicious that info code "0" is the result in these events - zero is an easy number to reach in a bug condition. Let me pursue that path for a bit. I have nothing that can easily unpack a dnstap message with EDE other than Vector, so it may take a bit.
Short description
Inconsistent messages in dnstap for EDE versus what is provided in query response
Environment
I'm looking at DNSSEC errors (coincidentally, in Amsterdam) for a day or so, and trying to figure out our classes of errors that are handed back in EDE which create a SERVFAIL towards the end user. I've trimmed down the error set - I excluded "No reachable authority" errors (which are rampant)
Here is the set from 24 hours excluding "no reachable authority", from a small sub-section of our AMS cluster.
So what are all those "other error" items? This seems to be an unusually large number in the "catchall" category.
I dug into this a bit, and I need some sanity checking, or perhaps this is a bug.
I found a domain that is coming up with "other error" as reported in the dnstap data set - tracker.publicbt.com. There are ~6000 of those in one of my logfiles, so I figured it would be a good test.
When I look at dnsviz, this is a "refused" error, and sure enough when I do a "dig" I get a no reachable authority result:
But when I look through the dnstap logs, I find that they are not being listed as "no reachable authority" but in fact are showing up as "other error" (info code 0). I find no events in the dnsstap output that shows "no reachable authority" for that name, even though the name appears hundreds of times. All of the errors are "other error" which seems to not match what I see in my actual query results.
I am collecting the data from dnstap, which is sent by dnsdist. pdns-rec is of course behind dnsdist, along with (as usual) unbound, which we currently do not have sending ede results (therefore, unbound answers never appear with any EDE data set, so they are not considered in my searches.)
Is this a dnsdist error with dnstap? Or is this a method problem?