Closed timkgh closed 1 year ago
Could you enable the verbose logging and see if there are any clues there? Enabling debug profiling (debug_pprof: true
in the configuration file), getting the profile from http://127.0.0.1:6060/debug/pprof/goroutine?debug=1
, and sending them to devteam@adguard.com could also help us see if there are any connection leaks.
Unfortunately I don't have enough space on my router for verbose logging and I'm also worried about flash wear.
Have you checked that it's not a memory issue? If you're on a ER12, that only has 1GB of RAM. It's conceivable that once RAM is used up, it goes to swap
.
Would you be able to show the load
values from uptime
to see whether there's something going on there? When the system starts swapping heavily, it will slow right to a crawl, and services become unresponsive.
As another aside, is the router still behaving as expected? Or is the CPU bottleneck causing router issues too?
It is not a memory problem, the memory use is normally < 300MB (right now 185MB). The router is fine when this happens, except 1 CPU core is at 100% taken by AGH. Under normal use the CPU cores are at < 5%. The AGH web interface works, AGH just stops responding to DNS requests. I don't need to reboot the router to recover, I just need to kill AGH.
# free -h
total used free shared buff/cache available
Mem: 973M 173M 549M 11M 250M 773M
Swap: 0B 0B 0B
uptime: 07:01:30 up 142 days, 19:26, 1 user, load average: 0.25, 0.25, 0.16
I've reverted to .29 for now because that's been stable for me. Problems started with the recent .30
I am having a simiular random issue and am running the latest v0.107.32 on centos-release-7-9.2009.1.el7.centos.x86_64. AGH stops responding to DNS requests, and still able to access the GUI. Restarting the AdGuardHome service does not work i.e. have to force a reboot. Was not able to check CPU/Memory usage yet, but never had issues before the last 1-2 releases. I did observe the logs below, and have increaed the UDP receiver buffer as described here (https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size). Am waiting to see if the issue stops...
[root@dns1 ~]# systemctl status AdGuardHome ● AdGuardHome.service - AdGuard Home: Network-level blocker Loaded: loaded (/etc/systemd/system/AdGuardHome.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2023-06-16 06:40:36 CEST; 13h ago Main PID: 734 (AdGuardHome) CGroup: /system.slice/AdGuardHome.service └─734 /root/AdGuardHome/AdGuardHome -s run
Jun 16 17:18:11 dns1 AdGuardHome[734]: 2023/06/16 17:18:11.470190 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:18:14 dns1 AdGuardHome[734]: 2023/06/16 17:18:14.472600 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:11 dns1 AdGuardHome[734]: 2023/06/16 17:21:11.454448 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:14 dns1 AdGuardHome[734]: 2023/06/16 17:21:14.464525 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:40 dns1 AdGuardHome[734]: 2023/06/16 17:21:40.538860 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:43 dns1 AdGuardHome[734]: 2023/06/16 17:21:43.802327 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:40:34 dns1 AdGuardHome[734]: 2023/06/16 17:40:34.338152 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:40:37 dns1 AdGuardHome[734]: 2023/06/16 17:40:37.342884 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 18:00:19 dns1 AdGuardHome[734]: 2023/06/16 18:00:19.746670 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 18:00:22 dns1 AdGuardHome[734]: 2023/06/16 18:00:22.771022 [error] dnsproxy: unpacking udp packet: dns: buffer size too small [root@dns1 ~]#
I am having a simiular random issue and am running the latest v0.107.32 on centos-release-7-9.2009.1.el7.centos.x86_64. AGH stops responding to DNS requests, and still able to access the GUI. Restarting the AdGuardHome service does not work i.e. have to force a reboot. Was not able to check CPU/Memory usage yet, but never had issues before the last 1-2 releases. I did observe the logs below, and have increaed the UDP receiver buffer as described here (https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size). Am waiting to see if the issue stops...
[root@dns1 ~]# systemctl status AdGuardHome ● AdGuardHome.service - AdGuard Home: Network-level blocker Loaded: loaded (/etc/systemd/system/AdGuardHome.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2023-06-16 06:40:36 CEST; 13h ago Main PID: 734 (AdGuardHome) CGroup: /system.slice/AdGuardHome.service └─734 /root/AdGuardHome/AdGuardHome -s run
Jun 16 17:18:11 dns1 AdGuardHome[734]: 2023/06/16 17:18:11.470190 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:18:14 dns1 AdGuardHome[734]: 2023/06/16 17:18:14.472600 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:11 dns1 AdGuardHome[734]: 2023/06/16 17:21:11.454448 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:14 dns1 AdGuardHome[734]: 2023/06/16 17:21:14.464525 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:40 dns1 AdGuardHome[734]: 2023/06/16 17:21:40.538860 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:21:43 dns1 AdGuardHome[734]: 2023/06/16 17:21:43.802327 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:40:34 dns1 AdGuardHome[734]: 2023/06/16 17:40:34.338152 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 17:40:37 dns1 AdGuardHome[734]: 2023/06/16 17:40:37.342884 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 18:00:19 dns1 AdGuardHome[734]: 2023/06/16 18:00:19.746670 [error] dnsproxy: unpacking udp packet: dns: buffer size too small Jun 16 18:00:22 dns1 AdGuardHome[734]: 2023/06/16 18:00:22.771022 [error] dnsproxy: unpacking udp packet: dns: buffer size too small [root@dns1 ~]#
Did not solve the issue...CPU is at 100%, enough free memory. Reverted back to v0.107.29.
@copacetic82, those errors are likely signalling that some plain-UDP clients are sending invalid DNS packets. We'd appreciate if you could reproduce the situation with verbose logging and profile collecting, as described in https://github.com/AdguardTeam/AdGuardHome/issues/5896#issuecomment-1588896375, if you decide to give it another go.
@copacetic82, those errors are likely signalling that some plain-UDP clients are sending invalid DNS packets. We'd appreciate if you could reproduce the situation with verbose logging and profile collecting, as described in https://github.com/AdguardTeam/AdGuardHome/issues/5896#issuecomment-1588896375, if you decide to give it another go.
Hi...thanks for replying...unfortunately i cant reproduce it at the moment as I am away on vacation. But a revert to v0.107.29 did solve the issue for me...no problems since, and using the same clients. Maybe I have time to debug once I get back.
It's happening to me since the last update. I will try the verbose logging
Same issue. AGH on Ubuntu 22.04 snap v0.107.32 Headless 8700k 32gb Pegged a core at 100% Nothing in logs around time of death in syslog/journal/AdGuardHome.err WebUI works, DNS doesn't. Yikes...
I face the same problem. Never had issues with AdguardHome until a few weeks. I agree this issue must have started with the last releases.
We are still not able to reproduce this properly, and the data we've received so far unfortunately doesn't shed any light on what is happening and why. Anyone still having issues, please help us collect the data by doing the following:
Update to v0.108.0-b.38, which has more debug logging, or a later Edge release.
Enable debug profiling (debug_pprof: true
in the configuration file).
Wait for the issue to occur.
Save the profile from http://127.0.0.1:6060/debug/pprof/goroutine?debug=1
, the verbose log, and send both devteam@adguard.com with “AdGuard Home Issue 5896” in the subject line.
Hello same problem this morning . whith my adguard docker version: v0.108.0-a.586+2069eddf Stop to listen and respond DNS request.
Web UI is working
This problem is during since 1 month !!!
We are still not able to reproduce this properly, and the data we've received so far unfortunately doesn't shed any light on what is happening and why. Anyone still having issues, please help us collect the data by doing the following:
- Update to v0.108.0-b.38, which has more debug logging, or a later Edge release.
- Enable the verbose logging.
- Enable debug profiling (
debug_pprof: true
in the configuration file).- Wait for the issue to occur.
- Save the profile from
http://127.0.0.1:6060/debug/pprof/goroutine?debug=1
, the verbose log, and send both devteam@adguard.com with “AdGuard Home Issue 5896” in the subject line.
Have set this up and left it running.
Thanks to the logs sent by @bigwookie in #5962, we were able to pinpoint a bug in the code which could cause such issues. We've fixed it in version v0.108.0-a.600+c4651647
on the Edge channel. Could you please check if the fix improves the situation?
Thanks to the logs sent by @bigwookie in #5962, we were able to pinpoint a bug in the code which could cause such issues. We've fixed it in version
v0.108.0-a.600+c4651647
on the Edge channel. Could you please check if the fix improves the situation?
@ainar-g I have downloaded this and left it running.
@ainar-g I can confirm it's running better.
I've been having the same issue I believe and managed to sort out how to update to the latest Edge release over the weekend. It stalled again last night (running v0.108.0-a.600+c4651647) and I was able to grab and send a log of the event. Will follow up if it happens again since I neglected to enable the pprof option this time.
@ainar-g Just failed again will get the logs and send on.
just to confirm, i am also getting the same on v0.107.33
A reboot of both fixes is temporarily
LXC ADH set to Primary DNS and Docker is Secondary in DHCP
Ive had to switch the LXC Container off as ive no idea how to downgrade it or even use the Edge channel, but ive set the Docker to use Edge and its pulled v0.108.0-a.600+c4651647.
il monitor it
@timkgh, were you able to try the Edge fix?
Was having a similar problem where 3 adguard instances stopped randomly working. And kept rolling back to 108.30 beta, and with the latest edge version seems to have solved the issue
@timkgh, were you able to try the Edge fix?
No, but it doesn't look like it's fixed based on other reports.
Was having a similar problem where 3 adguard instances stopped randomly working. And kept rolling back to 108.30 beta, and with the latest edge version seems to have solved the issue
Close to 2 days without issues, before wouldn't survive 2h sometimes
I dont understand rollback to which version résolve temporrary the issue??? A fact is my first problem was saturday 10 june Adguard is automatically update whith docker Which version has been updated 10 june
I dont understand rollback to which version résolve temporrary the issue??? A fact is my first problem was saturday 10 june Adguard is automatically update whith docker Which version has been updated 10 june
.29 is stable
.30 also works fine for me.
I dont understand rollback to which version résolve temporrary the issue??? A fact is my first problem was saturday 10 june Adguard is automatically update whith docker Which version has been updated 10 june
Auto updated? Hmm I was expecting to only update when a new image was pulled only and then recreate the container?
You can pull a specific docker image version, that is what I've done to use the .30
docker pull adguard/adguardhome:v0.108.0-b.30
Pulls the specific version and then if you add the label when you create your container should use the image
v0.108.0-a.600+c4651647.
So far so good with this version. (Edge)
@ainar-g was kind enough to review my logs from the other day and determined that I had a different issue. After disabling these two reverse DNS options I have been able to get ~36 hours of issue-free uptime. I'm still on the Edge branch but will move back to the master branch with the same configuration this weekend to confirm.
He recommended that I follow up here in case any of you have a similar issue.
v0.108.0-a.600+c4651647.
So far so good with this version. (Edge)
I dont understand i thought good version is before v0.107.29 and not 108 ??
@ainar-g I can try a v0.107.x build if you can make one with the patch. I don't want to move to 108 yet.
v0.108.0-a.600+c4651647.
So far so good with this version. (Edge)
I dont understand i thought good version is before v0.107.29 and not 108 ??
Both are good. 107.29 is the version before the issue and 108 edge is the patched version.
the edge version is working fine for me so far.
hopefully latest stable release will also work
my docker which was using edge tag has now updated to v0.108.0-a.602+c21f958e from v0.108.0-a.600+c4651647. il keep an eye on it.
how using edge in docker. I have image: adguard/adguardhome:latest and my adguard version is version: v0.107.33 docker pull adguard/adguardhome:edge i think
Ok i have now version: v0.108.0-a.604+7b92d53b If the issue was solved ; what was the problem ?
do we know when the "fix" will be merged to stable and beta branches?
You need v0.108.0-a.600+c4651647 on the edge tag.
I have had it stop once, so it is better but still not as stable as 0.107.29
OK newest edge v0.108.0-a.604+7b92d53b has stopped and showing the same issue seen here.
I have full logs and sending to you @ainar-g
I have a v0.108.0-a.604+7b92d53b and a a600+c4651647 and they are still ok, but I'm load balancing across 3 instances so I could be not pushing it too much. But I am monitoring with an active healthcheck resolving google.com every 10sec or so
All good here on
v0.108.0-a.600+c4651647 v0.108.0-a.602+c21f958e
Running together. Top one is primary dns and Bottom is secondary dns.
Both running in docker on separate hosts.
I have a v0.108.0-a.604+7b92d53b and a a600+c4651647 and they are still ok, but I'm load balancing across 3 instances so I could be not pushing it too much. But I am monitoring with an active healthcheck resolving google.com every 10sec or so
I had the edge running for 3 days without issue and then failed, sometimes 5hrs.
I also run two load balanced, an edge release and a 0.107.29 so I don't lose DNS.
I have a v0.108.0-a.604+7b92d53b and a a600+c4651647 and they are still ok, but I'm load balancing across 3 instances so I could be not pushing it too much. But I am monitoring with an active healthcheck resolving google.com every 10sec or so
I had the edge running for 3 days without issue and then failed, sometimes 5hrs.
I also run two load balanced, an edge release and a 0.107.29 so I don't lose DNS.
Same here 3 docker instances in dif VMs:
Failed again v0.108.0-a.604+7b92d53b sent full logs again to support.
if i understand well the problem seems to be not resolve ?
@bigwookie, thanks again for the logs! We have found another potential issue and we're working on a fix.
@bigwookie, thanks again for the logs! We have found another potential issue and we're working on a fix.
Are these issues related to the browsing security and parental control services, so disabling them would prevent the issue from occurring?
Yes and yes.
Version v0.108.0-a.605+c02a1411
on the Edge channel contains the new fix. @bigwookie, could you please try that one out as well?
Version
v0.108.0-a.605+c02a1411
on the Edge channel contains the new fix. @bigwookie, could you please try that one out as well?
I am now running Version: v0.108.0-a.605+c02a1411
Prerequisites
[X] I have checked the Wiki and Discussions and found no answer
[X] I have searched other issues and found no duplicates
[X] I want to report a bug and not ask a question
Operating system type
Other (please mention in the description)
CPU architecture
64-bit MIPS
Installation
Other (please mention in the description)
Setup
On a router, DHCP is handled by the router
AdGuard Home version
v0.107.31
Description
v0.107.31 running on EdgeRouter 12, mips64_softfloat
Sometimes it just hangs, it doesn't respond to DNS requests anymore, 1 CPU is at 100% and it also does not gracefully restart via
systemctl restart
, it needs to be force killed.I know this is not a lot to go on, these random hangs are hard to troubleshoot. It was stable for a while, I believe this started with the updates to .30 and .31