getdnsapi / getdns

A modern asynchronous DNS API https://getdnsapi.net/
Other
468 stars 126 forks source link

Add "tcp_send_timeout" option to set a TCP send data timeout to recover stuck upstream connections quickly #484

Closed maciejsszmigiero closed 3 years ago

maciejsszmigiero commented 4 years ago

When using Stubby as a system DNS over TLS resolver with a Internet connection that disconnects and reconnects from time to time there is often a long waiting time (~20 minutes) after the connection reconnects before DNS queries start to work again.

This is because in this particular case all the upstream TLS TCP connections in Stubby are stuck waiting for upstream server response. Which will never arrive since the host external IP address might have changed and / or NAT router connection tracking entries for these TCP connections might have been removed when the Internet connection reconnected.

By default Linux tries to retransmit data on a TCP connection 15 times before finally terminating it. This takes 16 - 20 minutes, which is obviously a very long time to wait for system DNS resolving to work again. This is a real problem on weak mobile connections.

Thankfully, there is a "TCP_USER_TIMEOUT" per-socket option that allows explicitly setting how long the network stack will wait in such cases.

Let's add a matching "tcp_send_timeout" option to getdns that allows setting this option on outgoing TCP sockets. For backward compatibility the code won't try to set it by default.

With this option set to, for example, 15 seconds Stubby recovers pretty much instantly in such cases.

saradickinson commented 4 years ago

Thanks @maciejsszmigiero This looks useful - will review for the next release!