NLnetLabs / unbound

Unbound is a validating, recursive, and caching DNS resolver.
https://nlnetlabs.nl/unbound
BSD 3-Clause "New" or "Revised" License
3.06k stars 349 forks source link

Lookup for some hostnames fail the first time #692

Closed freddieleeman closed 2 years ago

freddieleeman commented 2 years ago

Describe the bug Some hostnames do not return a result on the first lookup. I have run into this issue with multiple hostnames and have not been able to find the cause.

To reproduce Steps to reproduce the behavior:

  1. unbound-control flush_zone vladimirpivo.ru
  2. dig @127.0.0.1 _smtp._tls.vladimirpivo.ru txt +short (no results)
  3. dig @127.0.0.1 _smtp._tls.vladimirpivo.ru txt +short (results)
  4. ?
  5. profit

Expected behavior Results on the first try

System:

gthess commented 2 years ago

Hi, dig has a timeout to wait for an answer. Unbound will keep resolving in the mean time and the next time you ask you get the already resolved answer. You can try increasing the dig timeout with +time=T where T is in seconds. The default value in my system is 5 seconds. If that is not the case and Unbound actually replies something please try without the +short option to see the whole DNS reply.

freddieleeman commented 2 years ago

The issue does not appear to be dig or timeout-related. Even PHP's checkdnsrr() returns a false on the first try. Adding +time=60 results in the same issue:

# unbound-control flush_zone vladimirpivo.ru
ok removed 4 rrsets, 1 messages and 0 key entries

# dig @127.0.0.1 _smtp._tls.vladimirpivo.ru txt +time=60

; <<>> DiG 9.16.27-Debian <<>> @127.0.0.1 _smtp._tls.vladimirpivo.ru txt +time=60
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 41842
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;_smtp._tls.vladimirpivo.ru.    IN      TXT

;; AUTHORITY SECTION:
vladimirpivo.ru.        600     IN      SOA     ns1.masterhost.ru. hostmaster.masterhost.ru. 1652770069 28800 7200 1209600 600

;; Query time: 84 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Jun 07 12:35:52 UTC 2022
;; MSG SIZE  rcvd: 117

# dig @127.0.0.1 _smtp._tls.vladimirpivo.ru txt +time=60

; <<>> DiG 9.16.27-Debian <<>> @127.0.0.1 _smtp._tls.vladimirpivo.ru txt +time=60
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44768
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;_smtp._tls.vladimirpivo.ru.    IN      TXT

;; ANSWER SECTION:
_smtp._tls.vladimirpivo.ru. 900 IN      TXT     "v=TLSRPTv1; rua=mailto:tlsrpt@dukx7k3q.uriports.com"

;; Query time: 92 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Jun 07 12:35:54 UTC 2022
;; MSG SIZE  rcvd: 119
gthess commented 2 years ago

Now I see. The nameservers responsible for this domain respond wrongfully to empty non-terminal DNS nodes (in this case _tls.vladimirpivo.ru) with NXDOMAIN. That question is asked by Unbound because qname-minimisation is on by default. That is interpreted by Unbound as there is nothing below there and the resolution ends there. Consecutive questions will try to do qname-minimisation from that point on and they will eventually get the answer. If you had DNSSEC validation enabled in Unbound, and since this domain is DNSSEC signed, the NXDOMAIN answer would also stick for future queries rendering that query unresolvable.

You can also see the error here: https://dnsviz.net/d/_smtp._tls.vladimirpivo.ru/Yp9I0g/dnssec/.

If you encounter a lot of those, turning off qname-minimisation will help by sacrificing privacy.

If you don't want to do DNSSEC validation (I think you have it currently disabled since I don't see Unbound validating the responses it returns to dig) you can remove the "validator" module altogether from the module-config option and that would allow looking further in that first query.

Contacting those nameserver operators may solve the issue in the long run.

freddieleeman commented 2 years ago

Thank you, I resolved the issue by adding qname-minimisation: no to the config. Thank you for your assistance.

freddieleeman commented 2 years ago

What does strike me as odd; why does the second lookup succeed? Shouldn't Unbound respond with an empty result every time when qname-minimisation is enabled?

gthess commented 2 years ago

For the second lookup Unbound knows the delegation point (nameserver) for _tls....; it's the one that replied NXDOMAIN for the first lookup. Then Unbound will issue the next label _smtp._tls.... to that same nameserver.

freddieleeman commented 2 years ago

So, this is by design? Because this causes a confusing and inconsistent result. I would expect the result to be the same for every query to unbound.

gthess commented 2 years ago

In this case the confusion happens because the validator module is used but I am guessing without a DNSSEC trust anchor. In this case the validator module instructs the iterator module to stop and hand over the NXDOMAIN response but because there is no trust anchor there is no validation happening and allowing for the consecutive queries to have a different behavior. If you either provide a trust anchor, or remove the validator module, it will yield consistent results.