dnsviz / dnsviz

GNU General Public License v2.0
970 stars 132 forks source link

Error reporting on records that exists on one server but not on another #137

Open fcelda opened 1 month ago

fcelda commented 1 month ago

I have a testing zone for multi-signer DNSSEC with three signers. One of the signers includes CDS/CDNSKEY records but other don't. DNSViz reports EXISTING_TYPE_NOT_IN_BITMAP error on the NSEC records and I wonder if that has ever been the case and if it is desired behavior.

Please, see the part of the graph with the zone records and notice CDNSKEY/CDS records signed by key ID 2371 and then the denials signed by 44688 and 43080:

rfc8901_dev

The individual responses from every DNS provider are valid. The response either contains the signed record or a valid proof of non-existence. This is also how the resolver will validate the response because the answers are self-contained. DNSViz however collects all responses and then checks a few properties on top of them.

I think there can be plenty of similar inconsistencies between responses in the multi-vendor environments which are harmless and which doesn't even need to be related to DNSSEC. It could be a feature disparity between the vendors, ALIAS-like record processing , etc. I also believe that this can happen temporarily with the traditional DNS when the records are updated and transferred to secondary servers with a delay.

Overall, I think that in this particular case, DNSViz shouldn't report an error but a warning or nothing at all. I think an error indicates that the DNS resolution is broken for the zone which is not the case.

I admit the situation for CDS/CDNSKEY is specific because the records may be required for correct functionality of DS record management at the parent zone. However I think that that would still warrant only a warning.

What do you think?

I'm attaching the dnsviz probe result: rfc8901_dev.json.gz

cdeccio commented 1 month ago

Hi @fcelda , I'm thinking about this.

It is clear that there are discrepancies between the answers.

alice.ns.cloudflare.com claims that it exists through a NODATA response with NSEC:

$ dig +dnssec @alice.ns.cloudflare.com rfc8901.dev cname

; <<>> DiG 9.18.24-1-Debian <<>> +dnssec @alice.ns.cloudflare.com rfc8901.dev cname
; (6 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40914
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;rfc8901.dev.           IN  CNAME

;; AUTHORITY SECTION:
rfc8901.dev.        1800    IN  SOA alice.ns.cloudflare.com. dns.cloudflare.com. 2345327011 10000 2400 604800 1800
rfc8901.dev.        1800    IN  NSEC    \000.rfc8901.dev. A NS SOA HINFO MX TXT AAAA LOC SRV NAPTR CERT SSHFP RRSIG NSEC DNSKEY TLSA SMIMEA HIP CDS CDNSKEY OPENPGPKEY SVCB HTTPS URI CAA
rfc8901.dev.        1800    IN  RRSIG   SOA 13 2 1800 20240702174339 20240630154339 34505 rfc8901.dev. ieqz86ZFgHMBZxxKQo8TyXy2xHCa3m9AHa1UFp5Watpc1GA5/YagcI3k peXymjumGeLhoKwFbxxc2ioMCE/BGQ==
rfc8901.dev.        1800    IN  RRSIG   NSEC 13 2 1800 20240702174339 20240630154339 34505 rfc8901.dev. p4ajVJp4nzHJcsKrknQQys/lo5HveYWKXCth9yHmN1+uUko2EJ/p7OPa 5Zapx07BfSuS9ke803HyJdD3+4SAgw==

;; Query time: 24 msec
;; SERVER: 172.64.32.60#53(alice.ns.cloudflare.com) (UDP)
;; WHEN: Mon Jul 01 10:43:39 MDT 2024
;; MSG SIZE  rcvd: 358

alice.ns.cloudflare.com also shows that it exists when CDNSKEY is actually queried:

$ dig +dnssec @alice.ns.cloudflare.com rfc8901.dev cdnskey

; <<>> DiG 9.18.24-1-Debian <<>> +dnssec @alice.ns.cloudflare.com rfc8901.dev cdnskey
; (6 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17541
;; flags: qr aa rd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;rfc8901.dev.           IN  CDNSKEY

;; ANSWER SECTION:
rfc8901.dev.        1800    IN  CDNSKEY 257 3 13 mdsswUyr3DPW132mOi8V9xESWE8jTo0dxCjjnopKl+GqJxpVXckHAeF+ KkxLbxILfDLUT0rAK9iUzy1L53eKGQ==
rfc8901.dev.        1800    IN  CDNSKEY 257 3 13 3HU7Fzh0HptAlbd4DRe8SglLvc83Sz17RV7ZtnOddHP4KQCHS6TtkD8V evsk81BpIduvjePtmHWert5hjcT1yA==
rfc8901.dev.        1800    IN  CDNSKEY 257 3 13 t+4DPP+MFZ0Cr7gAXiDYv6HTyXzq/O2ESVRLc/ysuh5xBXKIsjsj5baV 1HzhBNo2F7mbsevsEo0/6UEL8+JBmA==
rfc8901.dev.        1800    IN  RRSIG   CDNSKEY 13 2 1800 20240830134140 20240630134140 2371 rfc8901.dev. TDVjdrDy3P5y5auYJPSzQMYlLuD+uDJAeg8jslXahhfRNmEmSraDlFxm huxns/L4xv27wOWCo8Svq/dU/KqIWA==

;; Query time: 24 msec
;; SERVER: 172.64.32.60#53(alice.ns.cloudflare.com) (UDP)
;; WHEN: Mon Jul 01 11:13:55 MDT 2024
;; MSG SIZE  rcvd: 387

However, a.ns.fcelda.cz claims that it does not exist because of the bitmap:

$ dig +dnssec @a.ns.fcelda.cz rfc8901.dev cdnskey

  ; <<>> DiG 9.18.24-1-Debian <<>> +dnssec @a.ns.fcelda.cz rfc8901.dev cdnskey
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52871
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;rfc8901.dev.           IN  CDNSKEY

;; AUTHORITY SECTION:
rfc8901.dev.        1200    IN  SOA a.ns.fcelda.cz. hostmaster.fcelda.cz. 221 1200 1200 2678400 1200
rfc8901.dev.        1200    IN  NSEC    rfc8901.dev. A NS SOA TXT AAAA RRSIG NSEC DNSKEY CAA
rfc8901.dev.        1200    IN  RRSIG   SOA 13 2 1800 20240715135642 20240701122642 43808 rfc8901.dev. MgA9jGADzRJp2nMVOnY/pFRS8Jn5YbfeD6MCdR3lA8hfb29swKVAFNUo cyTFGECigyOYs8wYgieIDZd0P9REBg==
rfc8901.dev.        1200    IN  RRSIG   NSEC 13 2 1200 20240715135642 20240701122642 43808 rfc8901.dev. Y9xjT3ywRUDPJ8A5+IMSu/Ud2RYHDFGz/CQXXFT6hEDVuEoVRHQV6LsW FQdFF5Q1Suds2RLiZHvYDjxTXPbkjQ==

;; Query time: 132 msec
;; SERVER: 45.76.37.63#53(a.ns.fcelda.cz) (UDP)
;; WHEN: Mon Jul 01 10:43:53 MDT 2024
;; MSG SIZE  rcvd: 352

It seems to me that the two possible scenarios in which these meet are the following:

  1. resolver queries auth server (e.g., alice.ns.cloudflare.com) and receives NODATA response, indicating that CDNSKEY exists;
  2. resolver needs CDNSKEY and doesn't have any reason to believe that it doesn't exist (based on previous NODATA response), so it queries for CDNSKEY and either gets an answer (from alice.ns.cloudflare.com) consistent with the previous NODATA response) or gets another NODATA response (from a.ns.fcelda.cz).

Or:

  1. resolver queries auth server (e.g., a.ns.fcelda.cz) and receives NODATA response, indicating that CDNSKEY does not exist;
  2. resolver needs CDNSKEY but checks cache and sees that it doesn't exist (based on previous NODATA response), so it never queries for CDNSKEY, even though it exists. This is where the inconsistency has potential consequences.

I think there can be plenty of similar inconsistencies between responses in the multi-vendor environments which are harmless and which doesn't even need to be related to DNSSEC. It could be a feature disparity between the vendors, ALIAS-like record processing , etc. I also believe that this can happen temporarily with the traditional DNS when the records are updated and transferred to secondary servers with a delay.

That's true, and normally DNSViz doesn't warn or error with regard to different content of DNS records (e.g., if one server returns 192.0.2.1 and another returns 192.0.2.2). However, inconsistencies with NSEC(3) proofs were causing issues with resolvers, so I added consistency checks for comparing negative and positive responses. This has helped operators track down problems with their systems.

Overall, I think that in this particular case, DNSViz shouldn't report an error but a warning or nothing at all. I think an error indicates that the DNS resolution is broken for the zone which is not the case.

It's true that my use of warning vs. error is a little inconsistent and has changed over time. Usually, it's error if it results in an actual failure or it's direct violation of specification (e.g., "MUST NOT"). I could reconsider this as well.

But maybe I could ask... why would you want different versions of CDS/CDNSKEY record existence, even across different multi-vendor environments? Is there a good reason for that? There might be (which is why I'm asking), but I just can't think of it myself.

fcelda commented 1 month ago

It seems to me that the two possible scenarios in which these meet are the following:

You are right and I agree this leads to non-deterministic resolution.

I think there can be plenty of similar inconsistencies between responses in the multi-vendor environments which are harmless and which doesn't even need to be related to DNSSEC. It could be a feature disparity between the vendors, ALIAS-like record processing , etc. I also believe that this can happen temporarily with the traditional DNS when the records are updated and transferred to secondary servers with a delay.

That's true, and normally DNSViz doesn't warn or error with regard to different content of DNS records (e.g., if one server returns 192.0.2.1 and another returns 192.0.2.2). However, inconsistencies with NSEC(3) proofs were causing issues with resolvers, so I added consistency checks for comparing negative and positive responses. This has helped operators track down problems with their systems.

What issues do the inconsistencies cause to resolvers? I think the resolver need to cope with this because the inconsistencies can happen whenever a record is added or removed. Even the same server can send a different NSEC for two subsequent queries for the same name.

Cloudflare's online-signing implementation is especially surprising in that regard. For instance, nor MX and TLSA exists at the apex of my testing zones and yet the types are returned in the bitmap unless you ask for them.

% dig @alice.ns.cloudflare.com. rfc8901.dev. MX +dnssec | grep "IN\s\+NSEC" | sed 's/\s\+/ /g' | cut -d " " -f 6-   
A NS SOA HINFO TXT AAAA LOC SRV NAPTR CERT SSHFP RRSIG NSEC DNSKEY TLSA SMIMEA HIP CDS CDNSKEY OPENPGPKEY SVCB HTTPS URI CAA

% dig @alice.ns.cloudflare.com. rfc8901.dev. TLSA +dnssec | grep "IN\s\+NSEC" | sed 's/\s\+/ /g' | cut -d " " -f 6-
A NS SOA HINFO MX TXT AAAA LOC SRV NAPTR CERT SSHFP RRSIG NSEC DNSKEY SMIMEA HIP CDS CDNSKEY OPENPGPKEY SVCB HTTPS URI CAA

I actually just tried dnsviz probe with -R MX,TLSA and it doesn't yield any error nor warning. So I assume this particular check is specific to CDS and CDNSKEY.

Overall, I think that in this particular case, DNSViz shouldn't report an error but a warning or nothing at all. I think an error indicates that the DNS resolution is broken for the zone which is not the case.

It's true that my use of warning vs. error is a little inconsistent and has changed over time. Usually, it's error if it results in an actual failure or it's direct violation of specification (e.g., "MUST NOT"). I could reconsider this as well.

Please do. :slightly_smiling_face: I think it should be a warning. It won't cause any immediate resolution errors to the clients but it may indicate a potential problem with the DS updates automation.

But maybe I could ask... why would you want different versions of CDS/CDNSKEY record existence, even across different multi-vendor environments? Is there a good reason for that? There might be (which is why I'm asking), but I just can't think of it myself.

In this particular case it's feature disparity between the vendors: Cloudflare always includes CDS/CDNSKEY into the zone and it cannot be disabled (at least on the free account I have) and NS1 doesn't support CDS/CDNSKEY yet.

cdeccio commented 1 month ago

I haven't dropped this, but it requires some thinking that I haven't had time to give it yet.

cdeccio commented 1 month ago

Also, referencing #51 , #76 , and #114