AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home.html
GNU General Public License v3.0
24.98k stars 1.8k forks source link

AdGuard Home cannot work for 5 minutes after boot, DNS timeouts (Raspberry Pi) #633

Closed ghost closed 5 years ago

ghost commented 5 years ago

When setting the Upstream DNS servers from the default TLS to Cloudflare's DoH (https://cloudflare-dns.com/dns-query), rebooting the Pi causes AdGuard Home to behave unexpectedly once booted.

No clients can use DNS, the DNS requests all timeout. Trying to visit the admin page causes the settings page to freeze and not load for a very long time, with errors about timeouts for version checks to github. If I can get it to eventually load, I need to clear the DoH entry so it reverts back to the default TLS entries.

Once it uses TLS again, all clients can use DNS fine and the AdGuard Home admin interface works normally again.

So DNS-over-HTTPS does not work for me once I restart the Pi.

ameshkov commented 5 years ago

Could you please collect AGH logs (preferably, with verbose logging enabled)?

ameshkov commented 5 years ago

Tbh, it sounds like some kind of a recursion issue (AGH tries to resolve Cloudflare domain name through itself). What bootstrap DNS server do you have configured?

ghost commented 5 years ago

Oops, you are right. I do have 1.1.1.1 as the bootstrap DNS server, does this mean the bootstrap must be different to the Upstream DNS server, or can I use IP addresses instead of the domain name, since the default TLS works at the moment?

eg. instead of https://cloudflare-dns.com/dns-query, I set https://1.1.1.1/dns-query and https://1.0.0.1/dns-query

ameshkov commented 5 years ago

Oops, you are right. I do have 1.1.1.1 as the bootstrap DNS server, does this mean the bootstrap must be different to the Upstream DNS server

Well, actually it should've worked just okay, what matters is that bootstrap DNS shouldn't be 127.0.0.1.

Could you please export logs just in case? We want to make a DOH upstream default so it might be important.

ghost commented 5 years ago

Could you please export logs just in case? We want to make a DOH upstream default so it might be important.

I've done it with verbose, but there is quite a bit of metadata. Could I email it to you so it's not public on here?

From what I see, if you set the DoH without rebooting, it can work fine: 2019/03/12 09:51:22 [42] proxy.exchangeWithUpstream(): upstream https://cloudflare-dns.com:443/dns-query successfully finished exchange of ;adguardteam.github.io

But once rebooted you get this instead: 2019/03/12 09:54:18 [40] proxy.exchangeWithUpstream(): upstream https://cloudflare-dns.com:443/dns-query failed to exchange ;sb.adtidy.org.

And since the admin interface relies on external connections, it also suffers.

ameshkov commented 5 years ago

I've done it with verbose, but there is quite a bit of metadata. Could I email it to you so it's not public on here?

Sure, please send it to devteam@adguard.com

ghost commented 5 years ago

Thank you, sent it now.

ameshkov commented 5 years ago

Got the log, thank you!

Quick analysis:

  1. Bootstrap DNS is okay, it managed to resolve cloudflare-dns.com

    2019/03/12 09:54:08 [40] upstream.lookup(): successfully finish lookup for cloudflare-dns.com in 34 milliseconds using 1.1.1.1:53. Result : [{104.16.112.25 } {104.16.111.25 } {2606:4700::6810:7019 } {2606:4700::6810:6f19 }]
  2. It has successfully connected to the DOH server:

    2019/03/12 09:54:13 [75] upstream.createDialContext.func1(): Dialing to 104.16.112.25:443
    2019/03/12 09:54:13 [75] upstream.createDialContext.func1(): dialer successfully initialize connection to 104.16.112.25:443 in 25 milliseconds
  3. Then the HTTP request timed out:

    2019/03/12 09:54:18 [40] proxy.exchangeWithUpstream(): upstream https://cloudflare-dns.com:443/dns-query failed to exchange ;sb.adtidy.org. IN   A in 10043665621 milliseconds. Cause: couldn't do a POST request to 'https://cloudflare-dns.com:443/dns-query', cause: Post https://cloudflare-dns.com:443/dns-query: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

It means that the TLS connection was also established without issues, but there's no response from CF to the DNS query.

Is this how it behaves every time or is it just after rebooting the pi?

ghost commented 5 years ago

Is this how it behaves every time or is it just after rebooting the pi?

Once I save the settings for the DoH server to overwrite the existing TLS defaults, it can still function (as you saw in the logs). But, once I either restart the service or restart the Pi, all DNS stops working no matter what I do. Once I set it back to TLS defaults and restart the service or Pi, all is working again.

ameshkov commented 5 years ago

@planet0 is it specific to Cloudflare or to any DOH resolver? Have you tried any other DOH server?

ghost commented 5 years ago

I tried to use Quad9's DoH and the same issue occurred. Unfortunately I also managed to completely break the entire AdGuard Home service as the default Cloudflare TLS also stopped working and rebooting the Pi wasn't doing anything either. Deleting the .yaml file and setting it up again still didn't work. Reinstalling the service didn't work either.

I reinstalled Pi-hole and the DNS started to work again without a reboot.

I think I need to try using AdGuard Home with a fresh install of Raspbian and not from a previous Pi-hole installation, and revisit this another time.

ameshkov commented 5 years ago

Sounds really unusual, I don't understand what can possibly cause all this:( As if something is preventing connecting to non-standard ports or smth like this.

ghost commented 5 years ago

It might be how much Pi-hole modifies or configs the Pi, but I won't know until I fresh install... does any over at AdGuard have a Raspberry Pi that could replicate installing Pi-hole, uninstalling it and then trying to run AdGuard Home for this?

ameshkov commented 5 years ago

@planet0 yeah, I guess we can try that. Btw, have you been using the built-in DHCP in pi-hole and AGH?

@szolin could you please try doing it?

ghost commented 5 years ago

Btw, have you been using the built-in DHCP in pi-hole and AGH?

For Pi-hole I was using the built-in DHCP as it works really good, for this instance I used my TP-Link to manage DHCP for using AdGuard Home as there are still some bugs to iron out, and no ability for static leases from memory.

The only other thing I can think of is that I was able to use 'cloudflared' alongside Pi-hole so I could use DNS-over-HTTPS for 1.1.1.1, with this guide: https://docs.pi-hole.net/guides/dns-over-https/

It was working really well.

I did uninstall the service and remove all cloudflared files before trying AdGuard Home, but the fact that cloudflared can work for DoH and not AdGuard Home is confusing.

ameshkov commented 5 years ago

@planet0 btw, any idea where Koala postfix comes from in the DNS questions after the service restart?

;; QUESTION SECTION:
;adguardteam.github.io.Koala.   IN   A
ameshkov commented 5 years ago

Ah, nvm

ghost commented 5 years ago

It was the DHCP Domain name, yeah.

ameshkov commented 5 years ago

It'd interesting to check a couple more things:

  1. Try using https://1.1.1.1/dns-query as a DOH upstream
  2. After restart try this command to check connectivity to CF DNS: openssl s_client -connect 104.16.112.25:443 -servername cloudflare-dns.com. Make sure that this command is executed under the same user as AGH.
  3. Also, I'd like to see what you have in /etc/resolv.conf
ghost commented 5 years ago

Okay, I've played around with it some more after uninstalling Pi-hole again.

I found the root issue, and it's that when I reboot the Pi, AdGuard Home is unable to use the DNS port for about 5 minutes (I timed it with a stopwatch).

Once 5 minutes passes, all DNS traffic works and the version checks in the Admin page pass, and I can also change to use and test upstream for any DNS, so both https://cloudflare-dns.com/dns-query and https://1.1.1.1/dns-query works. I used v0.92-hotfix1 so I can see the banner appear for the new version. I can stop and restart AdGuard Home afterwards and it works straight away.

It's not related to the install or uninstall of the service, as running directly via sudo ./AdGuardHome --host 192.168.0.2 produces the same results (and was useful for testing).

This is why setting the DoH was working before rebooting when I opened this issue. Something on this Pi isn't letting AdGuard Home work for 5 minutes after booting up, yet Pi-hole is able to load instantly.

3. Also, I'd like to see what you have in /etc/resolv.conf

Pi-hole forces this to be 127.0.0.1 and leaves the static config on the Pi, but removing the static entry afterwards defaults resolv.conf to the IP of the Pi, in my case 192.168.0.2. Changing it to 1.1.1.1 or 8.8.8.8 doesn't help during these 5 minutes regardless.

ameshkov commented 5 years ago

to use the DNS port

The issue is valid for DOH only, right? Is it okay when you select a plain DNS upstream?

yet Pi-hole is able to load instantly

Were you using it's own DHCP when testing?

ghost commented 5 years ago

It got to the point where nothing was working, including regular upstreams or changing the bootstrap. After 5 minutes, all DNS types can work.

Pi-hole was still using the TP-Link DHCP device, I simply ran the install command and once finished, dns started to work. Rebooting the Pi also works seconds after finishing boot.

Another thought I had is that I had a few ‘apt-get’ updates on the Pi, perhaps one of them has changed the behaviour of Raspbian in the last few weeks since I last used AdGuard Home.

ameshkov commented 5 years ago

Do they both (AGH and Pi-Hole) work under the same user? Is it root?

ghost commented 5 years ago

Do they both (AGH and Pi-Hole) work under the same user? Is it root?

AdGuard Home doesn't require any user setup does it, it just uses the person's regular user account (eg. default 'pi')?

From what I can see and from looking at their install script, I believe Pi-hole uses their own user (cloudflared does too).

Line 1708 in that script adds a user 'pihole'.

Also looking at this issue over at Pi-hole (https://github.com/pi-hole/pi-hole/issues/1547), they mention that they use setcaps for binding ports, which I personally do not understand but might shed some light in how they operate it. They also seem to use root for some actions.

FTLDNS now runs as non-root but with setcap's for binding to lower ports and that's a big exposure reduced.

ghost commented 5 years ago

I think this is a wider non-AGH related issue, as I have had the same issue with cloudflared. Others are experiencing the same issue as me, so I suspect something Pi related and not an issue with AdGuard Home.

https://github.com/cloudflare/cloudflared/issues/23

ameshkov commented 5 years ago

Seems to be some consequence of pi-hole installation, but I still can't understand what's that exactly.

I do think it needs to be reported to pi-hole and not to cloudflared, they'll be as clueless as we are

ghost commented 5 years ago

This is a greater issue, as it is Raspbian related.

I fresh installed Raspbian from NOOBS, seemed to be a package from Nov 2018. AdGuard Home working perfect, cloudflared working perfect.

Now I updated my Pi to the latest, and the same issue has occurred right away with AdGuard Home.

I took note of what was updated for reference:

apt apt-transport-https apt-utils base-files bluez-firmware curl gnupg gnupg-agent gpgv libapt-inst2.0 libapt-pkg5.0 libc-bin libc-dev-bin libc-l10n libc6 libc6-dbg libc6-dev libcurl3 libcurl3-gnutls libpam-systemd libperl5.24 libpolkit-agent-1-0 libpolkit-backend-1-0 libpolkit-gobject-1-0 libraspberrypi-bin libraspberrypi-dev libraspberrypi-doc libraspberrypi0 libssl1.0.2 libssl1.1 libsystemd0 libudev1 libwbclient0 libxapian30 locales multiarch-support openssh-client openssh-server openssh-sftp-server openssl perl perl-base perl-modules-5.24 policykit-1 python-rpi.gpio python3-six raspberrypi-bootloader raspberrypi-kernel raspberrypi-sys-mods raspi-config raspi-copies-and-fills samba-common ssh systemd systemd-sysv tzdata udev wireless-regdb wpasupplicant

@ameshkov, so it seems like it could be easily replicated now, just update Raspbian to the latest versions via sudo apt-get update && sudo apt-get upgrade && sudo apt-get dist-upgrade, then sudo reboot and watch AdGuard Home not work for 5 minutes. 😢

ghost commented 5 years ago

Also need to figure out why services like AGH and cloudflared experience this issue, but Pi-hole is able to avoid this 'bug' somehow in its functionality. Might help determine what exactly is the cause.

saltama commented 5 years ago

I've seen the same issue since 0.92 too (the first version i tried) and while i had installed pi-hole first too I doubt that pi-hole could be the cause (can't imagine how?!). Regarding the user, i'm starting it with root, i guess they are using cap_net_bind_service to be able to open the privileged 53 port with another user.

szolin commented 5 years ago

Hello!

When AGH is installed as a service and DHCP server is enabled, the first run fails because network interface isn't set up yet. So the system tries to restart the service after some time.

After I've disabled DHCP server in my config, AGH starts quickly after reboot and there are no attempts by OS to restart the service, because it doesn't fail.

Here's my log:

Mar 28 14:43:08 raspberrypi AdGuardHome[347]: 2019/03/28 14:43:08 347#19 [info] Couldn't find IPv4 address of interface eth0 &{Index:2 MTU:1500 Name:eth0 HardwareAddr:b8:27:eb:40:2a:a2 Flags:up|broadcast|multicast}
Mar 28 14:43:08 raspberrypi AdGuardHome[347]: 2019/03/28 14:43:08 347#19 [fatal] Couldn't start DHCP server, cause: Couldn't find IPv4 address of interface eth0 &{Index:2 MTU:1500 Name:eth0 HardwareAddr:b8:27:eb:40:2a:a2 Flags:up|broadcast|multicast}
Mar 28 14:43:08 raspberrypi systemd[1]: AdGuardHome.service: Main process exited, code=exited, status=1/FAILURE
Mar 28 14:43:08 raspberrypi systemd[1]: AdGuardHome.service: Unit entered failed state.
Mar 28 14:43:08 raspberrypi systemd[1]: AdGuardHome.service: Failed with result 'exit-code'.
Mar 28 14:45:16 raspberrypi systemd[1]: AdGuardHome.service: Service hold-off time over, scheduling restart.

@planet0 Please run this command:

grep AdGuard /var/log/daemon.log

Do you see any lines similar to 'AdGuardHome.service: Service hold-off time over, scheduling restart.' ?

ghost commented 5 years ago

@szolin are you on an up to date Raspbian build and packages via apt-get? I can replicate the issue every time on fresh raspbian installs+ installing AGH + updating + rebooting, regardless of DHCP being enabled or not.

I’ll need to check the log as soon as I can to see what there is regarding AGH.

saltama commented 5 years ago

Same here, i've never enabled DHCP since i'm using the one on my router. I have configured the service like this:

[Unit]
Description=AdGuardHome DNS
After=network.target syslog.target nss-lookup.target

[Service]
Type=simple
Restart=always
ExecStart=/opt/AdGuardHome/AdGuardHome

[Install]
WantedBy=multi-user.target

And this is the non-verbose log from when i restarted a few minutes ago:

-- Logs begin at Thu 2019-03-28 14:04:15 GMT, end at Thu 2019-03-28 14:12:01 GMT. --
Mar 28 14:04:46 raspberrypi systemd[1]: Started AdGuardHome DNS.
Mar 28 14:04:47 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:47 [info] AdGuard Home, version v0.94-4-g0c86-dirty
Mar 28 14:04:48 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:48 [info] Upstream 0: 9.9.9.9:853
Mar 28 14:04:48 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:48 [info] Upstream 1: 1.1.1.1:853
Mar 28 14:04:48 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:48 [info] Upstream 2: 149.112.112.112:853
Mar 28 14:04:48 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:48 [info] Upstream 3: 1.0.0.1:853
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Start DNS server periodic jobs
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Starting the DNS proxy server
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] The server is configured to refuse ANY requests
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] DNS cache is enabled
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Creating the UDP server socket
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Listening to udp://[::]:53
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Creating the TCP server socket
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Listening to tcp://[::]:53
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] AdGuard Home is available on the following addresses:
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Go to http://127.0.0.1:3000
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Go to http://10.0.0.165:3000
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Entering the UDP listener loop on [::]:53
Mar 28 14:04:59 raspberrypi AdGuardHome[449]: 2019/03/28 14:04:59 [info] Entering the tcp listener loop on [::]:53
Mar 28 14:05:09 raspberrypi AdGuardHome[449]: 2019/03/28 14:05:09 [info] Couldn't request filter from URL https://hosts-file.net/ad_servers.txt, skipping: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:37922->127.0.0.1:53: i/o timeout
Mar 28 14:05:09 raspberrypi AdGuardHome[449]: 2019/03/28 14:05:09 [info] Failed to update filter https://hosts-file.net/ad_servers.txt: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:37922->127.0.0.1:53: i/o timeout
Mar 28 14:05:59 raspberrypi AdGuardHome[449]: crypto/rand: blocked for 60 seconds waiting to read random data from the kernel
Mar 28 14:06:09 raspberrypi AdGuardHome[449]: 2019/03/28 14:06:09 [info] Couldn't request filter from URL https://hosts-file.net/ad_servers.txt, skipping: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:58905->127.0.0.1:53: i/o timeout
Mar 28 14:06:09 raspberrypi AdGuardHome[449]: 2019/03/28 14:06:09 [info] Failed to update filter https://hosts-file.net/ad_servers.txt: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:58905->127.0.0.1:53: i/o timeout
Mar 28 14:07:09 raspberrypi AdGuardHome[449]: 2019/03/28 14:07:09 [info] Couldn't request filter from URL https://hosts-file.net/ad_servers.txt, skipping: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:38551->127.0.0.1:53: i/o timeout
Mar 28 14:07:09 raspberrypi AdGuardHome[449]: 2019/03/28 14:07:09 [info] Failed to update filter https://hosts-file.net/ad_servers.txt: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:38551->127.0.0.1:53: i/o timeout
Mar 28 14:08:12 raspberrypi AdGuardHome[449]: 2019/03/28 14:08:12 [info] Couldn't request filter from URL https://hosts-file.net/ad_servers.txt, skipping: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:36919->127.0.0.1:53: i/o timeout
Mar 28 14:08:12 raspberrypi AdGuardHome[449]: 2019/03/28 14:08:12 [info] Failed to update filter https://hosts-file.net/ad_servers.txt: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:36919->127.0.0.1:53: i/o timeout
Mar 28 14:09:11 raspberrypi AdGuardHome[449]: 2019/03/28 14:09:11 [info] Couldn't request filter from URL https://hosts-file.net/ad_servers.txt, skipping: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:33154->127.0.0.1:53: i/o timeout
Mar 28 14:09:11 raspberrypi AdGuardHome[449]: 2019/03/28 14:09:11 [info] Failed to update filter https://hosts-file.net/ad_servers.txt: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:33154->127.0.0.1:53: i/o timeout
Mar 28 14:10:11 raspberrypi AdGuardHome[449]: 2019/03/28 14:10:11 [info] Couldn't request filter from URL https://hosts-file.net/ad_servers.txt, skipping: Get https://hosts-file.net/ad_servers.txt: dial tcp: lookup hosts-file.net on 127.0.0.1:53: read udp 127.0.0.1:37166->127.0.0.1:53: i/o timeout

Still not working after 20 minutes this time, with the AdGuardHome process that has hundreds of threads as shown in htop (vs the usual 8).

saltama commented 5 years ago

And this is the log of the service after a reboot, but WITH verbose enabled: adh3.log.txt.zip

Not much too see other than some requests failing after 200s. Still hundreds of threads in htop. Both these tests are using only tls and parallel enabled.

But... i got it working right at boot with two DoH servers and no parallel, first time ever.

Edit: Tried again after 2 minutes, it stopped working again. :(

szolin commented 5 years ago

In the log file above there are 1228 attempted connections to 4 upstream servers from function AdguardTeam/dnsproxy/upstream.createDialContext() - 4 connections per 1 DNS request, plus 16 times AGH tried to get an update for a filter.

148 out of those 1228 requests failed with "i/o timeout" error after ~3 minutes. It means that 1k requests are still pending (and their socket fd's are still opened) while they are waiting for network I/O.

So looking at this log I can only see that AGH simply can't reach the Internet servers - they all fail after timeout. I recommend using a network traffic sniffer like tcpdump or wireshark to ensure that those connection attempts are really delivered to the network interface. And if they do, we need to check whether any packets from the upstream servers are delivered back to AGH.

Currently we don't know where the packets get stuck, so the first step is to find where this issue comes from.

ameshkov commented 5 years ago

@planet0 @saltama I just stumbled upon this comment in the similar issue of CF repo: https://github.com/cloudflare/cloudflared/issues/23#issuecomment-471933889

BUT after the connection is restored it still doesn't return DNS responses. I can fix it by pinging 1.0.0.1 (not restarting cloudflared) . This makes me suspect a lower layer issue - routing, perhaps?

Could you please check if it helps to ping 1.1.1.1/1.0.0.1?

saltama commented 5 years ago

Could you please check if it helps to ping 1.1.1.1/1.0.0.1?

Doesn't work for me, the pings go through but adguard is still blocked(after 20 mins it still is) with the hundreds of threads.

It means that 1k requests are still pending (and their socket fd's are still opened) while they are waiting for network I/O.

Should these be there in the first place? I mean, if we have another thread trying to make i/o to an unresponding address, should we open more? (also, shouldn't we have some limit?) But ok, i don't even know what creates those threads and why so... nevermind.

I recommend using a network traffic sniffer like tcpdump or wireshark to ensure that those connection attempts are really delivered to the network interface.

As last resort yes.

ghost commented 5 years ago

Quickly adding that it is possible to ping 1.1.1.1/1.0.0.1 during this time, but it doesn’t affect AGH regardless until 5 minutes after boot.

It was the same behaviour with cloudflared, I couldn’t do anything to fix it until 5 minutes as well for cloudflared to start working.

ameshkov commented 5 years ago

Should these be there in the first place? I mean, if we have another thread trying to make i/o to an unresponding address, should we open more? (also, shouldn't we have some limit?)

@saltama well, yeah, but it will only fix one of the symptoms, it will not fix the root cause.

Also, I think the i/o timeout issue you're experiencing is different from this one.

There are two different issues:

  1. No connection for the first 5 min after RPi reboot.
  2. i/o timeout errors that starting to appear after a few minutes of working without issues.

We tried updating our test RPi and installing the same packages, but still, we weren't able to reproduce it. So it seems we really need your help to figure what's wrong.

@planet0 @saltama could you please collect tcpdump for us?

Here's what needs to be done:

  1. Wait until the issue is reproduced.
  2. Run tcpdump -i IFACE -w FILE.pcap where IFACE is the name of the network interface, FILE is the dump filename.
  3. We'll need this file to analyze.
saltama commented 5 years ago

There are two different issues:

100% agree.

I'd prefer to send you the dump only when I'll be able to isolate the router and the pi0 in a somewhat proper testbed. Need some time to be able to set it up.

ameshkov commented 5 years ago

Sure, np.

Btw, as I recall, you can limit tcpdump to specific IP addresses so that there will be no risk in leaking anything private (communication with DOH/DOT is encrypted anyways).

ghost commented 5 years ago

Haven't had the chance to test it yet, but Raspbian had a new update released on April 8, I would like to see if that has made any changes regarding this issue.

Forgot to mention, I'm running Raspbian Lite instead of the desktop version.

ghost commented 5 years ago

Tried with cloudflared, still experiencing the same issue even with the recent updates for Raspbian and cloudflared. Going to assume that AdGuard Home would still experience the same issue then. Mentioned it at https://github.com/cloudflare/cloudflared/issues/91.

szolin commented 5 years ago

@planet0 Does the issue still persist? If yes, how about that tcpdump data for DNS requests? You can use tcpdump's filters (IPs, ports) and give us only related information.

ghost commented 5 years ago

@szolin I haven't forgotten about this! I've found some time to work on this now - would you be able to help me figure out how to get the required tcpdump data from a Raspberry Pi running Raspbian Lite?

Edit: Oops, I forgot about the instructions that @ameshkov provided above. Will get onto this, still sending to devteam[at]adguard.com?

ghost commented 5 years ago

Some good news @szolin @ameshkov, it seems that with Raspbian Buster, I no longer need to wait 5 minutes on boot - network connections are going through just fine. 🎉

szolin commented 5 years ago

Glad to hear!

saltama commented 5 years ago

Can confirm this too!

cwyin7788 commented 4 years ago

When setting the Upstream DNS servers from the default TLS to Cloudflare's DoH (https://cloudflare-dns.com/dns-query), rebooting the Pi causes AdGuard Home to behave unexpectedly once booted.

No clients can use DNS, the DNS requests all timeout. Trying to visit the admin page causes the settings page to freeze and not load for a very long time, with errors about timeouts for version checks to github. If I can get it to eventually load, I need to clear the DoH entry so it reverts back to the default TLS entries.

Once it uses TLS again, all clients can use DNS fine and the AdGuard Home admin interface works normally again.

So DNS-over-HTTPS does not work for me once I restart the Pi.

I have this problem too,I found that when u reboot the Pi,systemd-resolved this process will run,listen udp port 53,it make AGH cannot use port 53,it make AGH fail to run,so use this command to disable systemd-resolved.

sudo service systemd-resolved stop sudo systemctl disable systemd-resolved sudo reboot

after reboot,everything will back to normal

sorry for my bad english.

ghost commented 4 years ago

@i553041 I'm not sure if it's universal, but it seems that on my vanilla install of Raspbian, systemd-resolved is already disabled on my end. But this is still good, thank you.