TechnitiumSoftware / TechnitiumLibrary

A library for .net based applications.
https://technitium.com
GNU General Public License v3.0
101 stars 50 forks source link

Connection Pooling causes DNS to be unavailable on internet connection failover #12

Closed cdemi closed 1 month ago

cdemi commented 1 month ago

Summary: I have 2 internet connections (1 primary and 1 backup for failover). I am running Technitium as a Proxmox LXC on Ubuntu with the install script.

I am using Cloudflare and Google DoT (I also tried with DoH) as forwarders.

When my primary internet connection goes down and it fails over, DNS resolution to forwarders stops working until I restart the Technitium container.

There is internet connection available after a few seconds, I can ping from inside the container running Technitium etc... and I can manually nslookup to the Google and Cloudflare and I get a resolution. I can also use the built-in Technitium DNS client and if I choose a public resolved I get a response but if I choose This Server it doesn't resolve.

[2024-10-04 09:03:37 UTC] DNS Server failed to resolve the request 'api.pushover.net. A IN' using forwarders: https://dns.google/dns-query (8.8.8.8), https://dns.google/dns-query (8.8.4.4), https://cloudflare-dns.com/dns-query (1.1.1.1), https://cloudflare-dns.com/dns-query (1.0.0.1).
TechnitiumLibrary.Net.Dns.DnsClientNoResponseException: DnsClient failed to resolve the request 'api.pushover.net. A IN': request timed out for name servers [https://dns.google/dns-query (8.8.4.4), https://dns.google/dns-query (8.8.8.8), https://cloudflare-dns.com/dns-query (1.0.0.1), https://cloudflare-dns.com/dns-query (1.1.1.1)].
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalResolveAsync(DnsDatagram request, Func`3 getValidatedResponseAsync, Boolean doNotReorderNameServers, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4794
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalResolveAsync(DnsDatagram request, Func`3 getValidatedResponseAsync, Boolean doNotReorderNameServers, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4780
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalDnssecResolveAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4896
   at TechnitiumLibrary.Net.Dns.DnsClient.<>c__DisplayClass97_0.<<InternalCachedResolveQueryAsync>b__0>d.MoveNext() in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4995
--- End of stack trace from previous location ---
   at TechnitiumLibrary.Net.Dns.DnsClient.ResolveQueryAsync(DnsQuestionRecord question, Func`2 resolveAsync) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4254
   at TechnitiumLibrary.Net.Dns.DnsClient.InternalCachedResolveQueryAsync(DnsQuestionRecord question, CancellationToken cancellationToken) in Z:\Technitium\Projects\TechnitiumLibrary\TechnitiumLibrary.Net\Dns\DnsClient.cs:line 4977
   at DnsServerCore.Dns.DnsServer.DefaultRecursiveResolveAsync(DnsQuestionRecord question, NetworkAddress eDnsClientSubnet, IDnsCache dnsCache, Boolean dnssecValidation, Boolean skipDnsAppAuthoritativeRequestHandlers, CancellationToken cancellationToken) in Z:\Technitium\Projects\DnsServer\DnsServerCore\Dns\DnsServer.cs:line 3343
   at DnsServerCore.Dns.DnsServer.RecursiveResolverBackgroundTaskAsync(DnsQuestionRecord question, NetworkAddress eDnsClientSubnet, Boolean advancedForwardingClientSubnet, IReadOnlyList`1 conditionalForwarders, Boolean dnssecValidation, Boolean cachePrefetchOperation, Boolean cacheRefreshOperation, Boolean skipDnsAppAuthoritativeRequestHandlers, TaskCompletionSource`1 taskCompletionSource) in Z:\Technitium\Projects\DnsServer\DnsServerCore\Dns\DnsServer.cs:line 3127

I suspect that Technitium might still be holding the old HTTP/TCP connection in the connection pool and takes a long time to realize it's been terminated ungracefully and doesn't try to establish a new one.

When using DNS-over-UDP, the problem does not occur. I assume it's because UDP is a connectionless protocol and there is no connection pooler involved.

ShreyasZare commented 1 month ago

Thanks for the post. Yes the DoT/DoH connections are pooled and it may take few seconds for the DNS client code to realize that the connection is not responding before it can try to make a new connection. This is kind of expected with connection oriented protocol but usually the opposite party will respond with a RST packet which causes the connection to get dropped immediately while it seems that in your case, the server is just dropping those incoming packets due to IP address mismatch.

I can decrease the Send Timeout values to make it drop the connection earlier but this issue will still occur for at least few 10s of seconds and may cause the DNS server to do a failure cache which will expire in 10 sec.

cdemi commented 1 month ago

Thanks for your response!

Is that referencing this configuration parameter? image

ShreyasZare commented 1 month ago

You're welcome!

Is that referencing this configuration parameter?

Those are used only for inbound requests. The outbound requests are done separately by the DNS Client code in Technitium Library project.

cdemi commented 1 month ago

Makes sense! Maybe it would be a good idea to have these exposed in the DNS Server as configuration parameters for the DNS Client, because I understand that it doesn't make sense to change it for everyone as I'm sure not everyone has the same use case

ShreyasZare commented 1 month ago

Makes sense! Maybe it would be a good idea to have these exposed in the DNS Server as configuration parameters for the DNS Client, because I understand that it doesn't make sense to change it for everyone as I'm sure not everyone has the same use case

Yes, will evaluate if changing defaults works for all scenarios or if the options needs to be configurable.

ShreyasZare commented 1 month ago

Technitium DNS Server v13.1 is now available that fixes this issue by enabling TCP keep-alive option. This will now cause DNS server to detect connection issues within around 16 seconds. Do update and let me know your feedback.