Closed BKPepe closed 4 years ago
on my node with kernel 5.10 rc-4 (not patched through downstream) and OpenWrt's unbound app (unbound-daemon_1.12.0-1_arm_cortex-a9_vfpv3-d16.ipk) the connectivity issue partly reproduces:
error: SERVFAIL <monitor.dnsops.gov. A IN>: could not fetch nameservers for 0x20 fallback
error: SERVFAIL <main.dnsops.gov. A IN>: exceeded the maximum number of sends
error: SERVFAIL <main.dnsops.gov. A IN>: could not fetch nameservers for 0x20 fallback
error: SERVFAIL <dane-test.had.dnsops.gov. A IN>: exceeded the maximum number of sends
error: SERVFAIL <dane-test.had.dnsops.gov. A IN>: exceeded the maximum number of sends
If I do
dig dane-test.had.dnsops.gov
- unbound stops resolving DNS servers, but it is running, it does not crash.
This does not reproduce, i.e. unbound still handles any other queries thereafter.
I think it's clear that the zone is bad: many of their servers don't respond at all and the rest won't return any DNSKEY (though DS promises it). For me other concurrent queries (easy ones) weren't affected. Still, it's a bit weird that the SERVFAIL from Unbound arrived never or after veery long time:
;; From 127.0.0.1@53(UDP) in 96312.3 ms
it takes a bit yes, probably until it reaches the sends limit but not as long as you are reporting. Just restarted unbound to clear the cache and then
Not sure whether IPv4 / IPv6 matters.
Unbound probes servers that are not responding with fairly long timeouts. The timeouts are documented here https://www.nlnetlabs.nl/documentation/unbound/info-timeout/
So it is normal that it takes that long, there is no information yet and unbound has the capacity to keep trying for a while. It then caches the information should someone try again.
There is logic in there to stop these kinds of long queries from swamping unbound's request list, if that gets full the queries are dumped earlier.
I believe this closes the issue, so am marking it as closed.
Hi guys,
While searching on the Internet, I found one domain, which I could use to test DANE and noticed there is definitely one broken domain
dane-test.had.dnsops.gov
, which makes unbound running, but it is not resolving anything and it is required to restart unbound. Looking at dnsviz.net, I see there are issues with resolving NS namesmain.dnsops.gov
andmain.dnsops.gov
.This was confirmed by using
dig +trace dane-test.had.dnsops.gov
as it fails withIf I do
dig dane-test.had.dnsops.gov
- unbound stops resolving DNS servers, but it is running, it does not crash. Let's try to ask NS servers there is the same problem:When using unbound:
When using unbound:
It seems like there is no Internet connection, but ping works just unbound is not resolving addresses. If trying to use Knot Resolver or asking DNS recursive resolvers like CZ.NIC ODVR, Cloudflare, and as well Google DNS, it returns
SERVFAIL
.Domain dane-test.had.dnsops.gov while using Cloudflare
NS monitor.dnsops.gov while using Google DNS
This was tested on Unbound versions 1.11.0 and 1.12.0 using OpenWrt.
In
strace
, I see this and I am not sure if it is somehow useful:Also, I tried to use verbose logging on Unbound, but after running it for just 2 minutes log is incredibly large (8MB, ~100 000 rows). If necessary, I can provide it.
Thanks belong to @VojtechMyslivec, who was helping me with this issue and reproducing it on his end, too.