TechnitiumSoftware / DnsServer

Technitium DNS Server
https://technitium.com/dns/
GNU General Public License v3.0
4.48k stars 431 forks source link

Intermittent resolver failures #608

Closed ZzZombo closed 1 year ago

ZzZombo commented 1 year ago

I use dnscrypt-proxy as a DNS-over-HTTPS forwarder with TDNS. For some reason this largely become unusable recently. This how it looks on TDNS side:

[2023-04-15 11:39:36 Local] [127.0.0.1:64926] [UDP] QNAME: update.vivaldi.com; QTYPE: A; QCLASS: IN; RCODE: ServerFailure; ANSWER: []
[2023-04-15 11:39:36 Local] [192.168.0.2:64926] [UDP] QNAME: update.vivaldi.com; QTYPE: A; QCLASS: IN; RCODE: ServerFailure; ANSWER: []
[2023-04-15 11:39:36 Local] [192.168.0.3:64926] [UDP] QNAME: update.vivaldi.com; QTYPE: A; QCLASS: IN; RCODE: ServerFailure; ANSWER: []
[2023-04-15 11:39:37 Local] [192.168.0.3:64926] [UDP] QNAME: update.vivaldi.com; QTYPE: A; QCLASS: IN; RCODE: ServerFailure; ANSWER: []
[2023-04-15 11:39:37 Local] [127.0.0.1:64926] [UDP] QNAME: update.vivaldi.com; QTYPE: A; QCLASS: IN; RCODE: ServerFailure; ANSWER: []

At the same time, dnscrypt-proxy resolves those as expected w/o any issues:

[2023-04-15 11:39:38]   127.0.0.1   update.vivaldi.com  A   PASS    132ms   scaleway-fr
[2023-04-15 11:39:38]   127.0.0.1   update.vivaldi.com  A   PASS    138ms   scaleway-fr

What I get at is that for some reason DNS lookup in this configuration suddenly become prone to unpredictable failures w/o any apparent reason. It is for OK for like 30-40% of resolving attempts, but others fail.

ZzZombo commented 1 year ago

Note: to combat this I've made the latter to listen on more addresses for its DoH server and added those as forwarders as well and it appears to make a positive difference.

ShreyasZare commented 1 year ago

Thanks for the post. Your setup is not clear to me. Do you have Technitium DNS server running locally with UDP forwarder configured to forward requests to another locally running dnscrypt-proxy?

Do you see any errors in the DNS logs? ServerFailure is a generic error response for any issue with resolution. Check the cache for update.vivaldi.com and see what negative response is cached in there. Use the DNS Client tab to query for update.vivaldi.com and check if there are any Extended DNS Errors in response which explain the reason. Post anything you see here.

ZzZombo commented 1 year ago

Local Technitium DNS server runs alongside local dnscrypt-proxy, configured to use the latter as a DoH forwarder. I see different errors:

[2023-04-15 13:47:59 Local] DNS Server failed to resolve the request with QNAME: crl.pki.goog; QTYPE: A; QCLASS: IN; Forwarders: https://localhost:2000/dns-query
System.Net.Http.HttpRequestException: An error occurred while sending the request.
 ---> System.Net.Http.HttpProtocolException: The HTTP/2 server reset the stream. HTTP/2 error code 'INTERNAL_ERROR' (0x2).
   at System.Net.Http.Http2Connection.ThrowRequestAborted(Exception innerException)
   at System.Net.Http.Http2Connection.Http2Stream.CheckResponseBodyState()
   at System.Net.Http.Http2Connection.Http2Stream.TryEnsureHeaders()
   at System.Net.Http.Http2Connection.Http2Stream.ReadResponseHeadersAsync(CancellationToken cancellationToken)
   at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at TechnitiumLibrary.Net.Dns.ClientConnection.HttpsClientConnection.QueryAsync(DnsDatagram request, Int32 timeout, Int32 retries, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\ClientConnection\HttpsClientConnection.cs:line 192
   at TechnitiumLibrary.Net.Dns.DnsClient.<>c__DisplayClass67_0.<<InternalResolveAsync>g__DoResolveAsync|1>d.MoveNext() in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4034
--- End of stack trace from previous location ---
   at TechnitiumLibrary.Net.Dns.DnsClient.<>c__DisplayClass67_0.<<InternalResolveAsync>g__DoResolveAsync|1>d.MoveNext() in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4212
--- End of stack trace from previous location ---
   at TechnitiumLibrary.Net.Dns.DnsClient.<>c__DisplayClass67_0.<<InternalResolveAsync>g__DoResolveAsync|1>d.MoveNext() in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 3962
--- End of stack trace from previous location ---
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalResolveAsync(DnsDatagram request, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4312
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalResolveAsync(DnsDatagram request, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4312
   at TechnitiumLibrary.Net.Dns.DnsClient.GetDnsKeyForAsync(IReadOnlyList`1 lastDSRecords, DnsClient dnsClient, IDnsCache cache, UInt16 udpPayloadSize, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 2924
   at TechnitiumLibrary.Net.Dns.DnsClient.FindDnsKeyForAsync(String ownerName, DnsClass class, IReadOnlyList`1 currentDnsKeyRecords, DnsClient dnsClient, IDnsCache cache, UInt16 udpPayloadSize, DnsDatagram originalResponse, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 2815
   at TechnitiumLibrary.Net.Dns.DnsClient.DnssecValidateResponseAsync(DnsDatagram response, IReadOnlyList`1 lastDSRecords, DnsClient dnsClient, IDnsCache cache, UInt16 udpPayloadSize, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 2524
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalDnssecResolveAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4359
   at TechnitiumLibrary.Net.Dns.DnsClient.<>c__DisplayClass71_0.<<InternalCachedResolveQueryAsync>b__0>d.MoveNext() in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4471
--- End of stack trace from previous location ---
   at TechnitiumLibrary.Net.Dns.DnsClient.ResolveQueryAsync(DnsQuestionRecord question, Func`2 resolveAsync) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 3891
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalCachedResolveQueryAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4472
   at DnsServerCore.Dns.DnsServer.RecursiveResolveAsync(DnsQuestionRecord question, NetworkAddress eDnsClientSubnet, IReadOnlyList`1 conditionalForwarders, Boolean dnssecValidation, Boolean cachePrefetchOperation, Boolean cacheRefreshOperation, Boolean skipDnsAppAuthoritativeRequestHandlers, TaskCompletionSource`1 taskCompletionSource) in Z:\Technitium\Projects\DnsServer\DnsServerCore\Dns\DnsServer.cs:line 2894

But that accounts only for 20 instances of errors, although I've cleared all old logs before, still it's too few errors to explain everything.

Another:

[2023-04-15 14:22:37 Local] DNS Server failed to resolve the request with QNAME: google.com; QTYPE: AAAA; QCLASS: IN; Forwarders: https://localhost:2000/dns-query, https://localhost:2001/dns-query, https://localhost:2002/dns-query, https://localhost:2003/dns-query, https://localhost:2004/dns-query, https://localhost:2005/dns-query;
TechnitiumLibrary.Net.Dns.DnsClientResponseDnssecValidationException: DNSSEC validation failed due to missing RRSIG for owner name: com/SOA
   at TechnitiumLibrary.Net.Dns.DnsClient.DnssecValidateSignature(DnsDatagram response, IReadOnlyList`1 records, IReadOnlyList`1 dnsKeyRecords, IReadOnlyList`1 unsignedZones, Boolean isAuthoritySection, Boolean isAdditionalSection) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 2773
   at TechnitiumLibrary.Net.Dns.DnsClient.DnssecValidateSignature(DnsDatagram response, IReadOnlyList`1 dnsKeyRecords, IReadOnlyList`1 unsignedZones) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 2564
   at TechnitiumLibrary.Net.Dns.DnsClient.DnssecValidateResponseAsync(DnsDatagram response, IReadOnlyList`1 lastDSRecords, DnsClient dnsClient, IDnsCache cache, UInt16 udpPayloadSize, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 2524
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalDnssecResolveAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4359
   at TechnitiumLibrary.Net.Dns.DnsClient.<>c__DisplayClass71_0.<<InternalCachedResolveQueryAsync>b__0>d.MoveNext() in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4471
--- End of stack trace from previous location ---
   at TechnitiumLibrary.Net.Dns.DnsClient.ResolveQueryAsync(DnsQuestionRecord question, Func`2 resolveAsync) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 3891
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalCachedResolveQueryAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4472
   at DnsServerCore.Dns.DnsServer.RecursiveResolveAsync(DnsQuestionRecord question, NetworkAddress eDnsClientSubnet, IReadOnlyList`1 conditionalForwarders, Boolean dnssecValidation, Boolean cachePrefetchOperation, Boolean cacheRefreshOperation, Boolean skipDnsAppAuthoritativeRequestHandlers, TaskCompletionSource`1 taskCompletionSource) in Z:\Technitium\Projects\DnsServer\DnsServerCore\Dns\DnsServer.cs:line 2894

Note that dnscrypt-proxy is set to require DNSSEC support from all upstream resolvers so I'm not sure what's going on here, seems unlikely such a domain as a Google one would be set up incorrectly.

[2023-04-15 14:14:46 Local] DNS Server failed to resolve the request with QNAME: www.linkedin.com; QTYPE: A; QCLASS: IN; Forwarders: https://localhost:2000/dns-query, https://localhost:2001/dns-query, https://localhost:2002/dns-query, https://localhost:2003/dns-query, https://localhost:2004/dns-query, https://localhost:2005/dns-query;
TechnitiumLibrary.Net.Dns.DnsClientNoResponseException: DnsClient failed to resolve the request: request timed out.
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalResolveAsync(DnsDatagram request, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4312
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalResolveAsync(DnsDatagram request, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4312
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalDnssecResolveAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4359
   at TechnitiumLibrary.Net.Dns.DnsClient.<>c__DisplayClass71_0.<<InternalCachedResolveQueryAsync>b__0>d.MoveNext() in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4471
--- End of stack trace from previous location ---
   at TechnitiumLibrary.Net.Dns.DnsClient.ResolveQueryAsync(DnsQuestionRecord question, Func`2 resolveAsync) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 3891
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalCachedResolveQueryAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4472
   at DnsServerCore.Dns.DnsServer.RecursiveResolveAsync(DnsQuestionRecord question, NetworkAddress eDnsClientSubnet, IReadOnlyList`1 conditionalForwarders, Boolean dnssecValidation, Boolean cachePrefetchOperation, Boolean cacheRefreshOperation, Boolean skipDnsAppAuthoritativeRequestHandlers, TaskCompletionSource`1 taskCompletionSource) in Z:\Technitium\Projects\DnsServer\DnsServerCore\Dns\DnsServer.cs:line 2894

But the query logs from dnscrypt-proxy seem to contradict it:

[2023-04-14 14:12:46]   127.0.0.1   www.linkedin.com    A   PASS    122ms   v.dnscrypt.uk-ipv4
[2023-04-14 14:12:46]   127.0.0.1   ocsp.rootca1.amazontrust.com    A   PASS    125ms   v.dnscrypt.uk-ipv4
[2023-04-14 14:12:46]   127.0.0.1   www.linkedin.com    A   PASS    195ms   dnscrypt.ca-1
[2023-04-14 14:12:46]   127.0.0.1   linkedin.com    DS  PASS    358ms   v.dnscrypt.uk-ipv4

All lookups around that time seem to finish quickly and w/o any errors. But TDNS claims the request timed out despite all forwarder queries from around that time finishing cleanly and in short time. At appears timeouts are the primary cause for failures through the log I combed.

ShreyasZare commented 1 year ago

Thanks for the details.

Local Technitium DNS server runs alongside local dnscrypt-proxy, configured to use the latter as a DoH forwarder.

If both are running locally why use DoH locally? I would recommend that you use UDP instead which will be much efficient.

If you are not using an upstream DNS provider with dnscrypt protocol then I would recommend that you drop the dnscrypt-proxy and directly consume the upstream DNS service with Technitium DNS server.

[2023-04-15 13:47:59 Local] DNS Server failed to resolve the request with QNAME: crl.pki.goog; QTYPE: A; QCLASS: IN; Forwarders: https://localhost:2000/dns-query
System.Net.Http.HttpRequestException: An error occurred while sending the request.
 ---> System.Net.Http.HttpProtocolException: The HTTP/2 server reset the stream. HTTP/2 error code 'INTERNAL_ERROR' (0x2).

The first error snippet above says that the HTTP/2 server reset the stream which means that dnscrypt-proxy is closing the connection for some unknown reason.

[2023-04-15 14:22:37 Local] DNS Server failed to resolve the request with QNAME: google.com; QTYPE: AAAA; QCLASS: IN; Forwarders: https://localhost:2000/dns-query, https://localhost:2001/dns-query, https://localhost:2002/dns-query, https://localhost:2003/dns-query, https://localhost:2004/dns-query, https://localhost:2005/dns-query;
TechnitiumLibrary.Net.Dns.DnsClientResponseDnssecValidationException: DNSSEC validation failed due to missing RRSIG for owner name: com/SOA

The next error snippet above says DNSSEC validation failed due to missing RRSIG for owner name: com/SOA which means that dnscrypt-proxy did not return RRSIG record for com zone. This has nothing to do with Google being set correctly or not. Google.com is unsigned but com is signed and that has to be validated independently.

The dnscrypt-proxy being configured to validate DNSSEC is a different thing. It is supposed to also pass on DNSSEC related records to the downstream clients that are validating. So, it seems that dnscrypt-proxy does not support clients that are DNSSEC validating. To fix this, you should disable DNSSEC validation on Technitium DNS Server.

[2023-04-15 14:14:46 Local] DNS Server failed to resolve the request with QNAME: www.linkedin.com; QTYPE: A; QCLASS: IN; Forwarders: https://localhost:2000/dns-query, https://localhost:2001/dns-query, https://localhost:2002/dns-query, https://localhost:2003/dns-query, https://localhost:2004/dns-query, https://localhost:2005/dns-query;
TechnitiumLibrary.Net.Dns.DnsClientNoResponseException: DnsClient failed to resolve the request: request timed out.

In the above error, the issue is request timed out. However your dnscrypt-proxy logs say otherwise. This can be since you have like 6 DoH forwarders configured, the DNS Server will try to get a response from them concurrently. If it receives a response which fails for some validation error then its ignored. At last if there are no positive responses available, the last error which was recorded is logged as the error. So, there was one DoH forwarded at the end which did not respond in time and thus that was logged.

I would recommend that you use UDP instead of DoH here and disable DNSSEC validation in Technitium. That should fix most of the issue you see.

ZzZombo commented 1 year ago

If you are not using an upstream DNS provider with dnscrypt protocol then I would recommend that you drop the dnscrypt-proxy and directly consume the upstream DNS service with Technitium DNS server.

But I do not want to manually go through the same process that dnscrypt-proxy does in regards to selecting the upstream resolvers, so unless I'm overlooking how to replicate that functionality in TDNS I can't do that.

Alas, I do want to switch to UDP for dnscrypt-proxy but the issue #609 stops me from doing it.

ZzZombo commented 1 year ago

I suppose this issue can be closed now that I have a working setup by switching from DoH to UDP for the forwarder. Thanks for your insight and time!

ShreyasZare commented 1 year ago

But I do not want to manually go through the same process that dnscrypt-proxy does in regards to selecting the upstream resolvers, so unless I'm overlooking how to replicate that functionality in TDNS I can't do that.

If you are using DoT or DoH upstream with dnscrypt-proxy then just directly copy paste those URLs as the forwarder for Technitium DNS server and it will work.