NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.13k stars 359 forks source link

Trouble resolving some subdomains for the second time #962

Open hugleo opened 12 months ago

hugleo commented 12 months ago

If we resolve static.allianzparqueshop.com.br first and then try to resolve www.allianzparqueshop.com.br, the process will fail, and we'll get an empty answer for www.allianzparqueshop.com.br.

$ drill static.allianzparqueshop.com.br @xx.xx.xx.xx ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 24270 ;; flags: qr rd ra ; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;; static.allianzparqueshop.com.br. IN A

;; ANSWER SECTION: static.allianzparqueshop.com.br. 300 IN CNAME 1100410k.ha.azioncdn.net. 1100410k.ha.azioncdn.net. 20 IN A 179.191.169.57 1100410k.ha.azioncdn.net. 20 IN A 179.191.168.37 1100410k.ha.azioncdn.net. 20 IN A 179.191.168.39 1100410k.ha.azioncdn.net. 20 IN A 179.191.169.81 1100410k.ha.azioncdn.net. 20 IN A 179.191.169.113 1100410k.ha.azioncdn.net. 20 IN A 179.191.168.40 1100410k.ha.azioncdn.net. 20 IN A 179.191.168.43 1100410k.ha.azioncdn.net. 20 IN A 179.191.168.44 1100410k.ha.azioncdn.net. 20 IN A 179.191.169.73 1100410k.ha.azioncdn.net. 20 IN A 179.191.168.42

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 533 msec ;; SERVER: xx.xx.xx.xx ;; WHEN: Thu Nov 9 11:31:09 2023 ;; MSG SIZE rcvd: 247

$ drill www.allianzparqueshop.com.br @xx.xx.xx.xx ;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 57028 ;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;; www.allianzparqueshop.com.br. IN A

;; ANSWER SECTION:

;; AUTHORITY SECTION: allianzparqueshop.com.br. 3600 IN SOA ns1.exacttarget.com. hostmaster.exacttarget.com. 2022101900 7200 3600 1209600 3600

;; ADDITIONAL SECTION:

;; Query time: 301 msec ;; SERVER: xx.xx.xx.xx ;; WHEN: Thu Nov 9 11:31:16 2023 ;; MSG SIZE rcvd: 112

The problem is even worse if I enable prefetch: yes With repeated queries to static.allianzparqueshop.com.br, we end up with the same empty response when the TTL 20 expires. I can reproduce this with different operating systems and unbound versions.

If I forward to a public DNS, the problem does not occur forward-zone: name: "." forward-addr: 8.8.8.8 forward-addr: 8.8.4.4

The problem is not occurring with a BIND server.

Debian 12 Version 1.17.1

Configure line: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --with-pythonmodule --with-pyunbound --enable-subnet --enable-dnstap --enable-systemd --with-libnghttp2 --with-chroot-dir= --with-dnstap-socket-path=/run/dnstap.sock --disable-rpath --with-pidfile=/run/unbound.pid --with-libevent --enable-tfo-client --with-rootkey-file=/usr/share/dns/root.key --enable-tfo-server Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.0.11 19 Sep 2023 Linked modules: dns64 python subnetcache respip validator iterator TCP Fastopen feature available

Arch linux: Version 1.19.0

Configure line: --prefix=/usr --sysconfdir=/etc --localstatedir=/var --sbindir=/usr/bin --disable-rpath --enable-dnscrypt --enable-dnstap --enable-pie --enable-relro-now --enable-subnet --enable-systemd --enable-tfo-client --enable-tfo-server --enable-cachedb --with-libhiredis --with-conf-file=/etc/unbound/unbound.conf --with-pidfile=/run/unbound.pid --with-rootkey-file=/etc/trusted-key.key --with-libevent --with-libnghttp2 --with-pyunbound Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.1.4 24 Oct 2023 Linked modules: dns64 cachedb subnetcache respip validator iterator DNSCrypt feature available TCP Fastopen feature available

wcawijngaards commented 12 months ago

The domain allianzparqueshop.com.br seems to return the wrong NS information.

The com.br nameserver says the delegation of the domain has these nameservers:

allianzparqueshop.com.br.       3600    IN      NS      ns-1816.awsdns-35.co.uk.
allianzparqueshop.com.br.       3600    IN      NS      ns-1008.awsdns-62.net.
allianzparqueshop.com.br.       3600    IN      NS      ns-129.awsdns-16.com.
allianzparqueshop.com.br.       3600    IN      NS      ns-1232.awsdns-26.org.

But the error reported is when the static.allianzparqueshop.com.br. name is resolved, and when this is queried the following is returned:

;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa ; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 
;; QUESTION SECTION:
static.allianzparqueshop.com.br.        IN      A

;; ANSWER SECTION:
static.allianzparqueshop.com.br.        300     IN      CNAME   1100410k.ha.azioncdn.net.

;; AUTHORITY SECTION:
allianzparqueshop.com.br.       360     IN      NS      ns1.exacttarget.com.
allianzparqueshop.com.br.       360     IN      NS      ns2.exacttarget.com.
allianzparqueshop.com.br.       360     IN      NS      ns3.exacttarget.com.
allianzparqueshop.com.br.       360     IN      NS      ns4.exacttarget.com.

This what the nameserver for allianzparqueshop.com.br says, from IP 205.251.199.24. And what Unbound does, because it trusts the information from the server itself, is use the nameserver information that has been returned for future lookups. But these go to the exacttarget.com nameservers. And those return the empty response that is quoted in the issue report, an nxdomain for the www.allianzparqueshop.com.br. name.

So the issue is that the nameservers for allianzparqueshop return wrong nameservers for allianzparqueshop. For other software the question for the failure would be when they would use the nameserver information provided that returns wrong content. In reality, the nameservers for allianzparqueshop would need to be fixed to not return the ns1-4.exacttarget.com nameservers.

A workaround could be to put the IP addresses of the exacttarget.com nameservers in do-not-query-address: <IP> elements into config, and then unbound does not send queries to those nameservers any more. Or have local-data for the name. Or, maybe a better option, create a stub-zone for allianzparqueshop with stub-host: <name> entries for the ones in the working response, the awsdns names. Then unbound uses those exact nameservers for the name.

hugleo commented 12 months ago

Some things I also noticed:

I keep resolving. Various drill commands in sequence (1 second each query):

drill static.allianzparqueshop.com.br

;; ANSWER SECTION: static.allianzparqueshop.com.br. 280 IN CNAME 1100410k.ha.azioncdn.net. 1100410k.ha.azioncdn.net. 0 IN A 179.191.168.43 1100410k.ha.azioncdn.net. 0 IN A 179.191.169.65 1100410k.ha.azioncdn.net. 0 IN A 179.191.168.41 1100410k.ha.azioncdn.net. 0 IN A 179.191.169.57 1100410k.ha.azioncdn.net. 0 IN A 179.191.169.97 1100410k.ha.azioncdn.net. 0 IN A 179.191.169.113 1100410k.ha.azioncdn.net. 0 IN A 179.191.168.36 1100410k.ha.azioncdn.net. 0 IN A 179.191.168.38 1100410k.ha.azioncdn.net. 0 IN A 179.191.169.89 1100410k.ha.azioncdn.net. 0 IN A 179.191.168.44

This expires and unbound resolves correctly for more 30 TTl.

drill static.allianzparqueshop.com.br ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 31650 ;; flags: qr rd ra ; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;; static.allianzparqueshop.com.br. IN A

;; ANSWER SECTION: static.allianzparqueshop.com.br. 279 IN CNAME 1100410k.ha.azioncdn.net. 1100410k.ha.azioncdn.net. 30 IN A 179.191.168.38 1100410k.ha.azioncdn.net. 30 IN A 179.191.169.89 1100410k.ha.azioncdn.net. 30 IN A 179.191.168.44 1100410k.ha.azioncdn.net. 30 IN A 179.191.168.43 1100410k.ha.azioncdn.net. 30 IN A 179.191.169.65 1100410k.ha.azioncdn.net. 30 IN A 179.191.168.41 1100410k.ha.azioncdn.net. 30 IN A 179.191.169.57 1100410k.ha.azioncdn.net. 30 IN A 179.191.169.97 1100410k.ha.azioncdn.net. 30 IN A 179.191.169.113 1100410k.ha.azioncdn.net. 30 IN A 179.191.168.36

But the next query even if 30 TTL is not still expired unbound shows the empty result:

;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 65292 ;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;; www.allianzparqueshop.com.br. IN A

;; ANSWER SECTION:

;; AUTHORITY SECTION: allianzparqueshop.com.br. 3146 IN SOA ns1.exacttarget.com. hostmaster.exacttarget.com. 2022101900 7200 3600 1209600 3600

Sometimes I got different results when I query NS:

azioncdn.net. 119 IN SOA ns1.azioncdn.net. admin.azion.com. 2023110916 86400 86400 86400 120 allianzparqueshop.com.br. 2997 IN SOA ns1.exacttarget.com. hostmaster.exacttarget.com. 2022101900 7200 3600 1209600 3600

I got inverted results if I try to query in different orders. I restart unbound each time to clear everything.

drill NS www.allianzparqueshop.com.br azioncdn.net. 120 IN SOA ns1.azioncdn.net. admin.azion.com. 2023110916 86400 86400 86400 120

drill NS static.allianzparqueshop.com.br allianzparqueshop.com.br. 3600 IN SOA ns1.exacttarget.com. hostmaster.exacttarget.com. 2022101900 7200 3600 1209600 3600

Restart here and query again:

drill NS static.allianzparqueshop.com.br azioncdn.net. 120 IN SOA ns1.azioncdn.net. admin.azion.com. 2023110916 86400 86400 86400 120

drill NS www.allianzparqueshop.com.br allianzparqueshop.com.br. 3600 IN SOA ns1.exacttarget.com. hostmaster.exacttarget.com. 2022101900 7200 3600 1209600 3600

wcawijngaards commented 12 months ago

From the output, the SOA records have serial numbers. Those are based on a timestamp for both of them, this is really a choice of the domain hoster, but in this case both are timestamps it looks like. The 2022101900 serial number is from ns1.exacttarget.com and the 2023110916 is from ns1.azioncdn.com. So it looks like entries from 19 oct 2022 are causing an issue, and the new hoster, azion.com, has updated 9 nov 2023, 16 times. The NS entries pointing to the old hoster are still served by the zone and have not been removed yet.