AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home.html
GNU General Public License v3.0
23.85k stars 1.75k forks source link

High CPU Usage #2871

Open TrueConfiguration opened 3 years ago

TrueConfiguration commented 3 years ago

Issue Details

Expected Behavior

AGH should just block domains and answer to DNS queries via DoH and DoT.

Actual Behavior

AGH runs fine until asked to update the filter lists, then CPU usage spikes (load on 15 minutes go over 3.00) and DNS and web interfaces become unresponsive. This behavior goes on for hours, last time I tried, it spent 5 hours like this. Looking through Github issues, found the option to pass the --no-mem-optimization parameter, which indeed made things better (load 15 min. went to 1.80) but DNS is still unresponsive.

Additional Information

Aside from AGH, Unbound and Certbot there is nothing running in the VM. I started with AGH installed in the same Raspberry Pi 3B+ I used to Pi-hole, and when it showed the same behavior, I decided to try it on the VM, where I did a clean install and not even used the same AdGuard.yaml. Only two clients where ever connected to the VM AGH, my PC via YogaDNS and my Phone via Android Private DNS system configuration. I have around 60 filter lists (including whitelists) online that should amount to 4 to 5 million non-unique domains give or take.

ameshkov commented 3 years ago

To troubleshoot this issue we need to see AdGuard Home logs.

  1. Configure AdGuard Home to write verbose-level log.
  2. Reproduce the issue.
  3. Post the log file here.
TrueConfiguration commented 3 years ago

Here is the log:

https://pastebin.com/xfRarPE0

ameshkov commented 3 years ago

It seems that filtering engine initialization takes a lot of time due to the lists size. Do you really need to use that huge blocklists?

@ainar-g on a side note, we need to add logging initFiltering, even in debug log it's not printed when we actually start initializing engine.

TrueConfiguration commented 3 years ago

Don't really know, in Pi-hole I used a way bigger list (something a little over 4 million unique domains), so since it had never had problem with it, I assumed AGH would be just fine, even though these are not the same lists.

There is no way to mitigate the problem/use AGH with (very) comprehensive lists like this?

TrueConfiguration commented 3 years ago

I would like to make a final adendum. There are five specific lists that really looks like overkill, each one alone has over 1.7 million domains. They probably contain a fair amount of duplicated domains of the other lists, so I disabled them and AGH loaded in a second, as expected.

Maybe it's not the number of lists the problem, but if one (or some) of theses lists are too big.

ainar-g commented 3 years ago

There are definitely things that need to be optimised in the filter parsing code. I'm not sure what the realistic time frame here is, but we do need to add some better logging at the very least.