DNSCrypt / dnscrypt-resolvers

Lists of public DNSCrypt / DoH DNS servers and DNS relays
https://dnscrypt.info
1.19k stars 257 forks source link

DNSSEC support #545

Closed taam closed 2 years ago

taam commented 3 years ago

(Disclaimer: I am not a DNS or DNSSEC expert or even close, my knowledge is limited, so you should take the following with care.)

The DNS Stamp specification says the DNSSEC flag means "the server supports DNSSEC". After #540, I found out, that these servers actually validate DNSSEC if you ask them, but not by default, so I took the complete list (minus servers without DNSSEC flag) and did some basic DNSSEC checks using various test domains. This is the current result:

+ passed                                                 :  127
+ passed (but sets AD flag even when not asked)          :   15
# questionable (does not validate by default)            :    1
! unknown (for bad domain, repeated timeouts)            :   17
! unknown (for bad domain, repeated HTTP code 502)       :    7
! unknown (for bad domain, repeated HTTP code 503)       :   11
- failed (bad answer for good domain)                    :    2
- failed (answer without AD flag for good domain)        :   19
- failed (HTTP code 503 instead of SERVFAIL)             :   24
- failed (no error for bad domain with AD flag)          :   15
- failed (unexpected answer for unknownalgorithm domain) :    1

Testing this is not quite easy, as you can see for a number of servers (where each stamp is one server) there's no definitive result. But even when it looks like there is, you can't really be sure, as a lot of servers are actually a bunch of servers (load balanced/anycast/...), and it has shown that unfortunately sometimes there is inconsistent behavior within that server groups.

(If you have questions regarding a particular result, please let me know. Please also let me know if you have an idea regarding HTTP code 502, looks to me like it's used for internal timeouts.)

So, coming back to "the server supports DNSSEC", which kind of support is actually expected? In particular:

Thanks!

jedisct1 commented 3 years ago

Focusing on users' expectations, the first three are OK.

If a response is signed, but the signature doesn't match, something fishy is going on. In practice, it's almost always a DNSSEC configuration error, but assuming that it is not, responding with an error even if the DO bit doesn't reduce security. It can only increase it. Setting the AD flag unconditionally allows a resolver to cache the signatures, so that a second query is not required later if the same question is asked with the AD flag. The resolver can still strip the RRSIG/NSEC/etc. information to respond to clients that didn't ask for them.

Same thing with a HTTP error for a response: we have to look at it from a client perspective. What really matters is that the client doesn't get a valid response, that would defeat the purpose of DNSSEC.

Queries may time out. Routing issues happen. Misconfigured/down authoritative servers happen. Recursive servers get rate limited. And international->national traffic may also be heavily throttled, as in China. Servers should implement negative caching, but we can't exclude them from the list if they don't. Clients such as dnscrypt-proxy do negative caching on their end though, so it mitigates this.

4) On the other hand, violates users' expectations.

taam commented 3 years ago

Setting the AD flag even when not asked is probably not a problem, also the RFC says "SHOULD", that's why I put this in the "passed" category, I can agree here.

However, I'm not sure if I do agree regarding the HTTP errors. Even if this might not be a direct security issue, some thoughts:

It's at least very ugly I think, and I personally would want to avoid such servers, but I can somewhat understand, if you want to be a bit more relaxed here. However, I think in any case, the documentation should reflect the requirements more accurately.

Finally, I'm not sure about how to continue regarding the server tests. I was only updating my little DNS test/benchmark script, falling in this rabbit hole. Does someone maybe plan to create a more elaborate/refined test? With some additional work I can list some servers that failed tests, but it seems this is something that should ideally run more permanently, checking servers over time as they change. To give an example: Using the (single location) servers from #520, I currently see only 9 validating by default (with 503 on error), 5 only when asked. If you are using the anycast address, good luck testing this.

taam commented 2 years ago

May I ask why this has been closed? Has the issue been solved? (The analysis took quite some time, I hope it was not for nothing...)

jedisct1 commented 2 years ago

With the DNSSEC bit set, the natural expectation is that the resolver is always validating.

When a response doesn't validate, DNSCrypt servers return the original error code (SERVFAIL) while DoH servers may return different response codes. In any case, it's hard to make the distinction between a temporary error and a DNSSEC error, so dnscrypt-proxy retries. This is fine. Eventually, all we want is not return records with an OK response code to client applications.

If servers have the DNSSEC bit set, but do not do DNSSEC validation, or do it only when explicitly asked, the DNSSEC bit should not be set.

Such check can be part of the CI check that runs when a new PR is submitted.

taam commented 2 years ago

Extending the CI check script would be nice I think, not only for new servers, but ideally to check all existing servers on a regular basis (ideally often and from different locations due to load balancing/anycast/...). As the tests have shown, some of the current servers with DNSSEC flag set do not validate by default, and some don't validate even when asked to do so.

(However, I have to effectively disagree regarding This is fine.. Even if you don't mind my last two arguments, if this causes unnecessary timeouts/heavy delays, then it's a usability problem, and therefore not fine for me. I actually stopped using DNSCrypt a while ago due to a similar problem, as I repeatedly was troubled by a server with load balancing issues - which I know because I was testing it with the admins -, leading to high timeout rates, despite being very fast otherwise. So even if one considers a timeout acceptable in terms of signalling an error, it might not be acceptable overall, because eventually timeouts can hurt the user experience. So if you want, maybe it should be considered less of a DNSSEC and more of a possible user experience problem.)