AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home.html
GNU General Public License v3.0
25.5k stars 1.83k forks source link

DNS over TLS server no longer working after v0.107.0 upgrade #3977

Closed tescophil closed 2 years ago

tescophil commented 2 years ago

Issue Details

Expected Behavior

I expect to be able to send DNS queries to the AdGuard server over TLS

Actual Behavior

I have three phones that use Androids Private DNS feature to send DoT requests to AdGuard. This was working perfectly in the previous version, but now all three devices are unable to connect over TLS with a "Couldn't Connect" message under Android Private DNS setting

Screenshots

Screenshot:

Additional Information

I have also just tried this on an iPhone, and it still works..., so it looks like its limited to Android devices (at least on my network)

After some further investigation I have found the following :

If I use the "Automatic" setting under Private DNS on Android, DoT requests get sent directly to my AGH instance by the phone and are resolved correctly. Researching exactly what the "Automatic" setting does says that the phone will attempt to send a DoT request to the phones existing DNS servers assigned via DHCP. This makes sense, as the phone has the AGH DNS server assigned over DHCP. What I don't understand is why this works, as it should get a certificate failure. The cert on AGH is for "*.mydomain.duckdns.org, mydomain.duckdns.org" which I don't believe would be included in the DoT request from the phone.

I was using the "Custom" option on Android Private DNS with "myphone.mydomain.duckdns.org" so that each DoT client is specifically identified and logged (as opposed to just using "mydomain.duckdns.org"). On my firewall appliance I can see these requests being sent to AdGuard Home, but no request appears in the AGH logs. I also tried just setting "mydomain.duckdns.org" and this also fails...

So, the requests I think should fail are working, and the ones that I think should (and were) working fail...

ainar-g commented 2 years ago

Hello and thanks for the report. Just to be clear, do you have any entries in the allowlist or blocklist sections on the “Settings → DNS settings” page? Does resolving myphone.mydomain.duckdns.org with e.g. dig work from another machine in the network? Also, have you tried enabling and inspecting verbose logs for clues?

tescophil commented 2 years ago

Nothing in the Allowed clients and a bunch of IP ranges from China under disallowed clients that kept pinging AGH on port 853 (port scanning malware..)

The hostname I use on the phone resolves to my public IP as designed/expected.

phil@Desktop-PC:~$ dig phonephil.mydomain.duckdns.org

; <<>> DiG 9.16.22-Debian <<>> phonephil.mydomain.duckdns.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29280
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;phonephil.mydomain.duckdns.org.    IN  A

;; ANSWER SECTION:
phonephil.mydomain.duckdns.org. 3558 IN A   XXX.XXX.XXX.XXX <My public IP address>

;; Query time: 3 msec
;; SERVER: 192.168.100.10#53(192.168.100.10)
;; WHEN: Wed Dec 22 15:29:35 GMT 2021
;; MSG SIZE  rcvd: 73

phil@Desktop-PC:~$

This has all been working fine for months.., I've used the DoT server since it was first introduced, and apart from a certificate keychain trust issue several months ago, its been flawless...

OK, setup verbose logging, but I have over 70 clients on the network generating around 130K requests per day..., any clue as to what to look for ?

ainar-g commented 2 years ago

OK, setup verbose logging, but I have over 70 clients on the network generating around 130K requests per day..., any clue as to what to look for ?

Probably something related to TLS. Something along the lines of:

grep -e 'tls' -e 'error' -i | grep -e 'noerror' -v

You could also send (possibly anonymised) logs to devteam@adguard.com, if you want.

tescophil commented 2 years ago

OK, think I've found something (which is actually a lack of something...)

2021/12/22 15:54:50.545507 82986#727 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 192.168.10.143:59016
2021/12/22 15:54:50.545957 82986#727 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 5461
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;phonephil.mydomain.duckdns.org.    IN   AAAA

2021/12/22 15:54:50.546410 82986#727 [debug] IPv6 is disabled. Reply with NoError to phonephil.mydomain.duckdns.org. AAAA request
2021/12/22 15:54:50.546976 82986#727 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 5461
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;phonephil.mydomain.duckdns.org.    IN   AAAA

;; AUTHORITY SECTION:
phonephil.mydomain.duckdns.org. 10  IN  SOA fake-for-negative-caching.adguard.com. hostmaster.phonephil.mydomain.duckdns.org. 100500 1800 60 604800 86400

2021/12/22 15:54:50.558474 82986#728 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 192.168.10.143:28466
2021/12/22 15:54:50.558984 82986#728 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 42852
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;phonephil.mydomain.duckdns.org.    IN   A

2021/12/22 15:54:50.559531 82986#728 [debug] using settings for client phone_phil.mydomain.duckdns.org with ip 192.168.10.143 and id ""
2021/12/22 15:54:50.560074 82986#728 [debug] hosts container: handling the request
2021/12/22 15:54:50.561093 82986#728 [debug] filtering: found rule "@@||mydomain.duckdns.org^$important" for host "phonephil.mydomain.duckdns.org", filter list id: 0
2021/12/22 15:54:50.561620 82986#728 [debug] 192.168.100.1:53: sending request A phonephil.mydomain.duckdns.org.
2021/12/22 15:54:50.564026 82986#728 [debug] 192.168.100.1:53: response: ok
2021/12/22 15:54:50.564564 82986#728 [debug] github.com/AdguardTeam/dnsproxy/upstream.exchange(): upstream 192.168.100.1:53 successfully finished exchange of ;phonephil.mydomain.duckdns.org.  IN   A. Elapsed 2.926881ms.
2021/12/22 15:54:50.564972 82986#728 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).replyFromUpstream(): RTT: 3.423931ms
2021/12/22 15:54:50.565364 82986#728 [debug] ipset: starting processing
2021/12/22 15:54:50.565678 82986#728 [debug] ipset: added 0 new ipset entries
2021/12/22 15:54:50.566016 82986#728 [debug] client ip: 192.168.10.143
2021/12/22 15:54:50.566468 82986#728 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 42852
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;phonephil.mydomain.duckdns.org.    IN   A

;; ANSWER SECTION:
phonephil.mydomain.duckdns.org. 3600    IN  A   xxx.xxx.xxx.xxx <My Public IP>

2021/12/22 15:54:50.586926 82986#729 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).handleTCPConnection(): Start handling the new tls connection 192.168.10.143:37167

So, the start of this log is the point at which I set my Private DNS to phonephil.mydomain.duckdns.org, so the phone (192.168.10.143) sends a Plain UDP DNS request to AGH to resolve this domain. This then returns with my public IP (as designed/expected). The very next line in the log is

2021/12/22 15:54:50.586926 82986#729 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).handleTCPConnection(): Start handling the new tls connection 192.168.10.143:37167

And that's it, no more logs with the ref 82986#729 and the very next line in the log is a new UDP request from a different client. So it looks like AGH immediately (and silently) dropped the DoT request from the phone....

Here is how the log carries on at that point

2021/12/22 15:54:50.586926 82986#729 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).handleTCPConnection(): Start handling the new tls connection 192.168.10.143:37167
2021/12/22 15:54:50.982213 82986#668 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 192.168.10.253:60067
2021/12/22 15:54:50.982213 82986#669 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 192.168.10.253:60067
2021/12/22 15:54:50.982694 82986#668 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 61536
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   A

2021/12/22 15:54:50.983064 82986#669 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 63584
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   AAAA

2021/12/22 15:54:50.983595 82986#668 [debug] hosts container: handling the request
2021/12/22 15:54:50.983623 82986#669 [debug] IPv6 is disabled. Reply with NoError to www.google.com. AAAA request
2021/12/22 15:54:50.984386 82986#668 [debug] serving response from general cache
2021/12/22 15:54:50.984393 82986#669 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 63584
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   AAAA

;; AUTHORITY SECTION:
www.google.com. 10  IN  SOA fake-for-negative-caching.adguard.com. hostmaster.www.google.com. 100500 1800 60 604800 86400

2021/12/22 15:54:50.984748 82986#668 [debug] DNSFwd: Checking record A (142.250.200.4) for www.google.com.
2021/12/22 15:54:50.985522 82986#668 [debug] ipset: starting processing
2021/12/22 15:54:50.985837 82986#668 [debug] ipset: added 0 new ipset entries
2021/12/22 15:54:50.986178 82986#668 [debug] client ip: 192.168.10.253
2021/12/22 15:54:50.986618 82986#668 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 61536
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   A

;; ANSWER SECTION:
www.google.com. 3533    IN  A   142.250.200.4

2021/12/22 15:54:51.976988 82986#589 [debug] GET /control/status
2021/12/22 15:54:52.194143 82986#589 [debug] GET /control/filtering/status
2021/12/22 15:54:52.338178 82986#589 [debug] GET /control/parental/status
2021/12/22 15:54:52.484072 82986#589 [debug] GET /control/safebrowsing/status
2021/12/22 15:54:52.698188 82986#589 [debug] GET /control/safesearch/status
2021/12/22 15:54:54.486287 82986#737 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 192.168.10.158:53962
2021/12/22 15:54:54.486438 82986#672 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 192.168.10.158:53962
2021/12/22 15:54:54.486590 82986#737 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 49573
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   AAAA

2021/12/22 15:54:54.486661 82986#672 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 46757
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   A

2021/12/22 15:54:54.486939 82986#737 [debug] IPv6 is disabled. Reply with NoError to www.google.com. AAAA request
2021/12/22 15:54:54.487160 82986#672 [debug] hosts container: handling the request
2021/12/22 15:54:54.487249 82986#737 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 49573
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   AAAA

;; AUTHORITY SECTION:
www.google.com. 10  IN  SOA fake-for-negative-caching.adguard.com. hostmaster.www.google.com. 100500 1800 60 604800 86400

2021/12/22 15:54:54.487539 82986#672 [debug] serving response from general cache
2021/12/22 15:54:54.487701 82986#672 [debug] DNSFwd: Checking record A (142.250.200.4) for www.google.com.
2021/12/22 15:54:54.487956 82986#672 [debug] ipset: starting processing
2021/12/22 15:54:54.488103 82986#672 [debug] ipset: added 0 new ipset entries
2021/12/22 15:54:54.488267 82986#672 [debug] client ip: 192.168.10.158
2021/12/22 15:54:54.488484 82986#672 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 46757
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.    IN   A

;; ANSWER SECTION:
www.google.com. 3529    IN  A   142.250.200.4
ainar-g commented 2 years ago

Thanks for the data! At the very least it shows that we should really improve logging in our libraries there. We'll push a version with improved logging to the edge channel sometime tomorrow. Hopefully, that will clarify, what is happening in there.

ainar-g commented 2 years ago

@tescophil, we've made some improvements to logging and other things in version v0.108.0-a.11+d9df7c13 on the edge channel. Could you please try making queries from your device using that version?

Also, just in case, was your certificate issued by Let's Encrypt? There were issues with Android clients relating to their root certificate expiring recently. See #3689.

tescophil commented 2 years ago

I'm away from home for Christmas now, so will give it a go when I get back. Already sorted the root cert issue at the time. 👍

cacedo commented 2 years ago

Not sure if the same issue, but since a week ago I also get an error from my android devices that use "Private DNS" with my Adguard home instance. Also I get the same message in Adguard logs. I have installed edge version, now I see:

2021/12/28 14:37:58.529979 1#30 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).handleTCPConnection(): handling tcp: started handling tls request from 192.168.1.1:51114
2021/12/28 14:37:58.559174 1#30 [error] handling tcp: reading: reading len: remote error: tls: expired certificate

Which is strange since I can see is valid:

Status:
Certificate chain is valid
Subject: CN=********
Issuer: CN=R3,O=Let's Encrypt,C=US
Expires: 2022-03-03 08:27:10
Hostnames: ********

Also tried to use a certificate generated by ZeroSSL, same result.

tescophil commented 2 years ago

Installed the latest edge version and get the same behaviour. Tried to enable verbose logging on this version, and it just crashes silently without generating any logs. Got to be the same error @cacedo is seeing

tescophil commented 2 years ago

@ainar-g Did the debug output from @cacedo make sense / help pinpoint the issue ? Anything else we can do to help ?

tescophil commented 2 years ago

No movement on this issue for two weeks now, still looks like a (new, i.e. not #3689) certificate issue with queries from Android devices. I installed dnslookup and made some DoT queries from my desktop, all of which work perfectly...

phil@Desktop-PC:~$ dnslookup google.com tls://phonephil.mydomain.duckdns.org
dnslookup v. 1.5.1
dnslookup result:
;; opcode: QUERY, status: NOERROR, id: 27239
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.    IN   A

;; ANSWER SECTION:
google.com. 1788    IN  A   209.85.203.101
google.com. 1788    IN  A   209.85.203.102
google.com. 1788    IN  A   209.85.203.139
google.com. 1788    IN  A   209.85.203.138
google.com. 1788    IN  A   209.85.203.100
google.com. 1788    IN  A   209.85.203.113

phil@Desktop-PC:~$ 

Queries also continue to work from iOS devices.

What does work is from Android is using the 'Automatic' setting under 'Private DNS' which, from what documentation I can find, does something along the lines of this:

phil@Desktop-PC:~$ VERIFY=0 dnslookup google.com tls://192.168.100.10
dnslookup v. 1.5.1
TLS verification has been disabled
dnslookup result:
;; opcode: QUERY, status: NOERROR, id: 31352
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.    IN   A

;; ANSWER SECTION:
google.com. 1711    IN  A   209.85.203.101
google.com. 1711    IN  A   209.85.203.102
google.com. 1711    IN  A   209.85.203.139
google.com. 1711    IN  A   209.85.203.138
google.com. 1711    IN  A   209.85.203.100
google.com. 1711    IN  A   209.85.203.113

phil@Desktop-PC:~$ 

i.e. Sending the DoT request directlty to the system configured DNS servers with TLS auth turned off (otherwise there would be a certificate falure like this:

phil@Desktop-PC:~$ dnslookup google.com tls://192.168.100.10
dnslookup v. 1.5.1
2022/01/20 10:15:51 Cannot make the DNS request: Failed to get a connection from TLSPool to tls://192.168.100.10:853, cause: Failed to connect to 192.168.100.10, cause: x509: cannot validate certificate for 192.168.100.10 because it doesn't contain any IP SANs
phil@Desktop-PC:~$

Is there anything else I can do to move this issue forward ?

tescophil commented 2 years ago

Did this at the time, however I've just forced a renewal of the LetsEncrypt cert for AdGuardHome and then rebooted the machine its on. Now DoT requests are working again from all devices... Don't understand this, as my cert was not due for renewal until the end of Feb, and I had already implemented the 'fix' for the #3689 issue way back in September...

cacedo commented 2 years ago

Hi,

In my case I fixed it by deleting my AAAA (ipv6) record for my home domain. I noticed some apps that access to my home services were working but really slow opening connections. After deleting the AAAA record, connections were working fine again and also was the "Private DNS" that points to my home server.