PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.63k stars 904 forks source link

prometheus metrics: better understanding of SERVFAILS with RFC8914 Extended DNS Error counters #9733

Open appliedprivacy opened 3 years ago

appliedprivacy commented 3 years ago

Short description

When looking at dnsdist prometheus metrics SERVFAIL graphs the obvious question comes up: What is the root cause behind them? A recently published RFC aims to help with that: https://datatracker.ietf.org/doc/rfc8914/ https://blog.cloudflare.com/unwrap-the-servfail/

Usecase

Better understanding of the root cause behind SERVFAILs (if EDE data is available)

Description

Would be nice if each EDE case would be counted and published in prometheus metrics individually if the information is available.

Current metrics

could be extended with a EDE label containing the codes:

ede= https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#extended-dns-error-codes

 Other
 Unsupported DNSKEY Algorithm
 Unsupported DS Digest Type
 Stale Answer
 Forged Answer
 DNSSEC Indeterminate
 DNSSEC Bogus
 Signature Expired
 Signature Not Yet Valid
 DNSKEY Missing
 RRSIGs Missing
 No Zone Key Bit Set
 NSEC Missing
 Cached Error
 Not Ready
 Blocked
 Censored
 Filtered
 Prohibited
 Stale NXDOMAIN Answer
 Not Authoritative
 Not Supported
 No Reachable Authority
 Network Error
 Invalid Data

example:

dnsdist_servfail_responses{ede="DNSSEC Bogus"} 10

In addition to those with an EDE present it would be nice to also see the amount of SERVFAIL with no EDE present.

rgacogne commented 3 years ago

I don't think we will implement this in dnsdist, as it would require parsing the response which we try to avoid for performance (and feature creep) reasons. I realize it might be useful to centralize these counters when you have several backends, though., but I think it makes much more sense to have this implemented in the backend instead, as we are doing in https://github.com/PowerDNS/pdns/pull/9673.

appliedprivacy commented 3 years ago

Thanks for your explanation, understood for SERVFAIL that are simply relayed from backends, but how about SERVFAILs that dnsdist generates itself (ie. because it can not reach any backend)?

rgacogne commented 3 years ago

For answers generated directly from dnsdist it's a different matter and I would be glad to have that feature.

rgacogne commented 3 years ago

There was also some interest in being able to extend RCodeAction to support EDE in https://github.com/PowerDNS/pdns/pull/7636, but it was for an early version of the draft.

johnhtodd commented 3 years ago

+1 for being able to count EDE values that are produced/set by dnsdist itself by any means including those mentioned in #7636 (if fully implemented) or just for "simple" results like the backend(s) not being available, or clients being prohibited via various dynamic or static rules

appliedprivacy commented 2 years ago

it makes much more sense to have this implemented in the backend instead, as we are doing in #9673.

is there already a feature request to add EDE for backend metrics (pdns_recursor_servfail_answers) or should we file one?

rgacogne commented 2 years ago

I don't think we have a feature request for EDE metrics in the recursor, no.