NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.17k stars 361 forks source link

DNS over TLS: error: SSL_handshake syscall: No route to host #673

Closed pemensik closed 2 years ago

pemensik commented 2 years ago

Describe the bug TLS channel does not wait for readiness of socket. Even if it never reaches connected state, TLS channel setup is tried on it.

To reproduce Steps to reproduce the behavior:

  1. We have broken IPv6 in the office and it is intentional. It works only locally, but cannot reach the public network.
    
    # ping -6 -c 2 nlnetlabs.nl
    PING nlnetlabs.nl(dicht.nlnetlabs.nl (2a04:b900::1:0:0:10)) 56 data bytes
    From 2620:xx:0:xx::3fc (2620:xx:0:xx::3fc) icmp_seq=1 Destination unreachable: Address unreachable
    From 2620:xx:0:xx::3fc (2620:xx:0:xx::3fc) icmp_seq=2 Destination unreachable: Address unreachable

--- nlnetlabs.nl ping statistics --- 2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1001ms


2. Now when I configure cloudflare.conf:

server: tls-cert-bundle: "/etc/pki/tls/certs/ca-bundle.trust.crt"

forward-zone: name: "." forward-addr: 1.1.1.1@853 forward-addr: 1.0.0.1@853 forward-addr: 2606:4700:4700::1111@853 forward-addr: 2606:4700:4700::1001@853 forward-tls-upstream: yes

3. ``unbound-host -C cloudflare.conf nlnetlabs.nl``

unbound-host -C cloudflare.conf nlnetlabs.nl

[1652280699] libunbound[1465:0] error: SSL_handshake syscall: No route to host [1652280699] libunbound[1465:0] error: SSL_handshake syscall: No route to host nlnetlabs.nl has address 185.49.140.10 [1652280699] libunbound[1465:0] error: SSL_handshake syscall: No route to host [1652280699] libunbound[1465:0] error: SSL_handshake syscall: No route to host nlnetlabs.nl has IPv6 address 2a04:b900::1:0:0:10 nlnetlabs.nl mail is handled by 1 mx.soverin.net.


**Expected behavior**
It should not even attempt any action on IPv6 sockets until its socket is ready to write. That should ensure connection were successful. TLS and TCP are stateful protocols and such state should be tried first. That would ensure TLS setup errors would not appear even in network without real IPv6 connectivity, but with local IPv6 addresses.

**System:**
 - Unbound version: 1.15
 - OS: Fedora release 37 (Rawhide)
 - `unbound -V` output:

Version 1.15.0

Configure line: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pythonmodule --with-pyunbound PYTHON=/usr/bin/python3 --enable-dnstap --with-libnghttp2 --with-libevent --with-pthreads --with-ssl --disable-rpath --disable-static --enable-relro-now --enable-pie --enable-subnet --enable-ipsecmod --with-conf-file=/etc/unbound/unbound.conf --with-pidfile=/run/unbound/unbound.pid --enable-sha2 --disable-gost --enable-ecdsa --with-rootkey-file=/var/lib/unbound/root.key --enable-linux-ip-local-port-range Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.0.2 15 Mar 2022 Linked modules: dns64 python ipsecmod subnetcache respip validator iterator

BSD licensed, see LICENSE in source package for details. Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues



**Additional information**
Add any other information that you may have gathered about the issue here.
wcawijngaards commented 2 years ago

Fixed the issue in the commit. Thanks for the report!

I think that the code does wait for the socket to become ready. It becomes ready with an error, and this error comes up when the system calls in the TLS handshake interact with the socket. The errors from that are not filtered like the connect failures from tcp. Hence the log messages. The patch fixes that by squelching the error from the logs, it is visible at high verbosity values.

This also fixes it for a number of other error outputs, like host down and permission denied.

pemensik commented 2 years ago

I acknowledge it fixed the error. Thanks!

But I think it might write some kind of error, if I try using just IPv6 addresses and they are not usable. When I comment out IPv4 addresses, it never says the error were no route to host. I think that would be still useful if no other IP worked. Could it perhaps save last error and print at least single error when no address were reachable? Maybe also with summary of number of addresses tried.

# just IPv6 addresses used
# ./unbound-host -dC ~/cloudflare.conf unbound.net
[1652282465] libunbound[10130:0] notice: init module 0: validator
[1652282465] libunbound[10130:0] notice: init module 1: iterator
[1652282465] libunbound[10130:0] info: resolving unbound.net. A IN
Host unbound.net not found: 2(SERVFAIL).
[1652282465] libunbound[10130:0] info: resolving unbound.net. AAAA IN
Host unbound.net not found: 2(SERVFAIL).
[1652282465] libunbound[10130:0] info: resolving unbound.net. MX IN
Host unbound.net not found: 2(SERVFAIL).
pemensik commented 2 years ago

It might be quite useful when analysing unbound logs and it had temporary resolution problems, because default route were not available for some time. Now it would never report such condition. Which is better for unbound-host, but I think not for unbound.