Install succeeds but * some * valid domains will not resolve

drphr4ud commented 7 years ago

Installed

https://github.com/Angristan/Local-DNS-resolver/blob/master/ubuntu-unbound.sh on Ubuntu 16.04

also tried https://github.com/Angristan/Local-DNS-resolver/blob/master/centos-unbound.sh on CentOS 7.

Install succeeded. Service starts ok and is responsive:

root@dns2:~# unbound-control reload
ok

root@dns2:~# unbound-control status
version: 1.5.8
verbosity: 3
threads: 2
modules: 2 [ validator iterator ]
uptime: 415851 seconds
options: control(ssl)
unbound (pid 1469) is running...

As far as I can tell, I can usually resolve unsigned domains:

root@dns2:~# dig espncricinfo.com +dnssec +multi

; <<>> DiG 9.10.3-P4-Ubuntu <<>> espncricinfo.com +dnssec +multi
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, **status: NOERROR**, id: 61648
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;espncricinfo.com.      IN A

;; ANSWER SECTION:
espncricinfo.com.       573 IN **A 52.19.167.6**

Most DNSSEC signed domains resolve OK, too:

root@dns2:~# dig dnssectest.sidn.nl +dnssec +multi

; <<>> DiG 9.10.3-P4-Ubuntu <<>> dnssectest.sidn.nl +dnssec +multi
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54219
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 5, ADDITIONAL: 17

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;dnssectest.sidn.nl.    IN A

_[truncated irrelevant output]_

Stuff that should fail also tends to fail:

root@dns2:~# dig www.dnssec-failed.org

; <<>> DiG 9.10.3-P4-Ubuntu <<>> www.dnssec-failed.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, **status: SERVFAIL**, id: 61846
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.dnssec-failed.org.         IN      A

However,

some lookups fail and I have no idea why.

Does not seem to matter if the domain is signed or not.

I first noticed that I can't visit http://ipleak.net anymore

Then half the apps on my Roku claimed they have no connectivity because lookups failed.

root@dns2:~# dig -t A ipleak.net @127.0.0.1

; <<>> DiG 9.10.3-P4-Ubuntu <<>> -t A ipleak.net @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, **status: NOERROR**, id: 3183
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;ipleak.net.                    IN      A

;; Query time: 190 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Oct 27 00:54:28 SGT 2017
;; MSG SIZE  rcvd: 39

It returns NOERROR but then doesn't provide a response.

Compare with:

root@dns2:~# dig -t A ipleak.net @208.67.222.222

; <<>> DiG 9.10.3-P4-Ubuntu <<>> -t A ipleak.net @208.67.222.222
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, **status: NOERROR**, id: 775
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;ipleak.net.                    IN      A

;; ANSWER SECTION:
ipleak.net.             376     IN      **A       95.85.16.212**

;; Query time: 177 msec
;; SERVER: 208.67.222.222#53(208.67.222.222)
;; WHEN: Fri Oct 27 00:55:11 SGT 2017
;; MSG SIZE  rcvd: 55

First I thought it may just be an Ubuntu thing. But it happens on CentOS, too. Then I thought it may be some root servers refuse queries from some of my hosts (Vultr netblock). But I ended up setting up on a bunch of other hosts on Softlayer, DO, etc. in various regions and the issue persists in all cases.

What's the best way to troubleshoot this ?

Some people with similar issues blamed UDP fragmentation as the culprit. I tried

edns-buffer-size: 1280 in unbound.conf but it did not help.

angristan commented 7 years ago

As aeris advised on Twitter, I don't have these issues when installing unbound and using out-of-the-box (without the script). So obviously this is an issue with the configuration/the installation script.

angristan commented 7 years ago

Also it seems the script is useless, on Debian at least :)

angristan commented 7 years ago

Sp3r4z found the issue : it was use-caps-for-id, which is an experimental feature.

drphr4ud commented 7 years ago

Tested and confirmed that removing

use-caps-for-id: yes

from unbound.conf resolved the issue!

bortzmeyer commented 7 years ago

The problem is not in Unbound, or in the Debian package. use-caps-for-id is perfectly legitimate, since DNS is and has always been case-INsensitive.

No, the problem is that dnsleak.net name servers are deeply broken:

% dig @dns1.dnsleak.net A ipleak.net

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @dns1.dnsleak.net A ipleak.net
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61570
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ipleak.net.        IN A

;; ANSWER SECTION:
ipleak.net.     3600 IN AAAA 2a03:b0c0:0:1010::509:d001
ipleak.net.     3600 IN A 95.85.16.212

;; Query time: 25 msec
;; SERVER: 2a03:b0c0:0:1010::509:d001#53(2a03:b0c0:0:1010::509:d001)
;; WHEN: Sat Oct 28 11:58:21 CEST 2017
;; MSG SIZE  rcvd: 72

% dig @dns1.dnsleak.net A IPleak.net

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @dns1.dnsleak.net A IPleak.net
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24072
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;IPleak.net.        IN A

;; Query time: 26 msec
;; SERVER: 2a03:b0c0:0:1010::509:d001#53(2a03:b0c0:0:1010::509:d001)
;; WHEN: Sat Oct 28 11:58:26 CEST 2017
;; MSG SIZE  rcvd: 28

They don't return data when the case change. That's an awful violation of DNS case-insensitivity. Unbound was right to reject it.

angristan commented 7 years ago

Thanks @bortzmeyer. Should we using use-caps-for-id then? I understand it's used to foil spoof attempts.

bortzmeyer commented 7 years ago

@Angristan Yes, use-caps-for-id is a (limited) protection against spoofing attempts. It is documented in the draft "Use of Bit 0x20 in DNS Labels to Improve Transaction Identity" You should not disable it just because there are broken servers on the Internet.

drphr4ud commented 7 years ago

The problem is there seem to be many broken servers on the internet.

Lots of stuff broke. Not just obscure little fringe cases like ipleak.net.

I just used ipleak.net as an example in the report as it is short and easy to remember.

angristan commented 7 years ago

@drphr4ud removing use-caps-for-id resolved the issues you had with all those domains?

drphr4ud commented 7 years ago

Yes it did.

I see what bortzmeyer said is 100% correct

dig -t A iPLEaK.NeT

returns nothing, but it should!

Google had the same problem it seems and found 70% of their DNS traffic gets RFC compliant responses but 30% does not. They made a white-list to work around it.

Our current solution to this problem is to create a whitelist of name servers which we know apply the standards correctly, and to only apply the case randomization technique in requests to those servers.

bortzmeyer commented 7 years ago

I have trouble believing the problem is so common ("30 %"). At home, I use a resolver with the 0x20 trick (Knot Resolver) and, while ipleak.net indeed does not work, not me, nor one of the two non-geek users noticed anything (and, believe me, they are quick to report problems).

Any other example of problem in the real world? Which domain?

angristan commented 7 years ago

I also never noticed any issue but ipleak.net.

drphr4ud commented 7 years ago

Like I said, half the apps on my Roku would not work anymore when the Roku used unbound resolver with use-caps-for-id: yes set

IIRC Hulu and Vudu had issues resolving their CDN servers. To reiterate: I had massive usability problems and ipleak.net was just mentioned because its easy to remember.

I do not work for Google so no idea how accurate their numbers are but they say that overall across 8.8.8.8, 8.8.4.4 and their entire public DNS traffic:

Our current solution to this problem is to create a whitelist of name servers which we know apply the standards correctly, and to only apply the case randomization technique in requests to those servers. We also list the appropriate exception subdomains for each of them, based on analyzing our logs. If a response that appears to come from those servers does not contain the correct case, we reject the response.

The whitelisted name servers comprise more than 70% of our traffic.

bortzmeyer commented 7 years ago

@drphr4ud You say so but you do not provide even one extra name (besides ipleak.net) of a domain that fails to resolve.

drphr4ud commented 7 years ago

Because I am not at the site where I can break the config again and make it fail and run wireshark to see what DNS queries are made....

The issues were what prompted me to log this issue. ipleak.net came into play for me much later in the process. My first obersvation was:

I use unbound with use-caps-for-id: yes enabled and various stuff broke. Applications claimed I am not connected to the internet. Sharp TV wouldn't check for firmware updates and lock up.

Set DHCP Server to let these devices use 8.8.8.8 and 8.8.4.4 instead and they all worked again from that moment. Changed them back to use unbound and they died again.

Somewhere along the way I noticed that one domain that doesn't work is ipleak.net

angristan / local-dns-resolver

Install succeeds but * some * valid domains will not resolve #8