PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.67k stars 906 forks source link

High CPU usage after receiving mass notify. #7579

Open george2asenov opened 5 years ago

george2asenov commented 5 years ago

Short description

After receiving many NOTIFY queries at once (like when restarting BIND master with more than 1000 domains in this case) PDNS (working as slave - BIND -> PDNS superslave/supermaster -> PDNS slave yes it happen on the two PDNS servers in the chain ) and MYSQL start to consume ~100% constantly till next restart. This continues even after all AXFR-s are completed and log stop logging anything ( at loglevel=6) and no DNS queries (just in test environment). Just doing nothing but keep CPU consumption high.

issue: https://github.com/PowerDNS/pdns/issues/622 is the same but appears to be resolved back then. Apparently it is not.

Environment

Steps to reproduce

(Lua) scripts that are loaded. --> none

  1. configure PDNS with gmysql or gsqlite3
  2. configure PDNS as superslave of BIND with more than for example 500 domains
  3. restart the MASER BIND server so it send notifications for all domains.
  4. wait all AXFR-s complete
  5. watch CPU start using ~1 core constanty

Expected behaviour

After all AXFR complete pdns to become idle and consume ~0%CPU

Actual behaviour

PDNS stuck on using too much cpu for an idling service. strace:

[pid 21655] write(29, "!\0\0\0\27\2\0\0\0\0\1\0\0\0\0\1\376\0\376\0\3SOA\obfusc-domain3.com", 37) = 37
[pid 21655] read(29, "\1\0\0\1\0106\0\0\2\3def\4pdns\7records\7records\7content\7content\f!\0\0\356\2\0\375\0\0\0\0\0.\0\0\3\3def\4pdns\7records\7records\3ttl\3ttl\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\4\3def\4pdns\7records\7records\4prio\4prio\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\5\3def\4pdns\7records\7records\4type\4type\f!\0\36\0\0\0\375\0@\0\0\0:\0\0\6\3def\4pdns\7records\7records\tdomain_id\tdomain_id\f?\0\v\0\0\0\3\10@\0\0\0008\0\0\7\3def\4pdns\7records\7records\10disabled\10disabled\f?\0\1\0\0\0\1\0\0\0\0\0000\0\0\10\3def\4pdns\7records\7records\4name\4name\f!\0\375\2\0\0\375\10@\0\0\0000\0\0\t\3def\4pdns\7records\7records\4auth\4auth\f?\0\1\0\0\0\1\0\0\0\0\0\5\0\0\n\376\0\0\2\0q\0\0\v\0\0\0Nns34.nsdomain.tld root.s15.nsdomain.tld 1537911317 10800 3600 604800 38400\0\226\0\0\0\0\0\0\3SOA7\6\0\0\0\obfusc-domain3.com\1\5\0\0\f\376\0\0\2\0", 16384) = 578
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] write(29, "\5\0\0\0\32\2\0\0\0", 9) = 9
[pid 21655] read(29, "\7\0\0\1\0\0\0\2\0\0\0", 16384) = 11
[pid 21655] fcntl(29, F_GETFL)          = 0x2 (flags O_RDWR)
[pid 21655] fcntl(29, F_GETFL)          = 0x2 (flags O_RDWR)
[pid 21655] fcntl(29, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 21655] recvfrom(29, 0x7fe49f1c6cc7, 1, MSG_PEEK, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 21655] fcntl(29, F_GETFL)          = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 21655] fcntl(29, F_SETFL, O_RDWR)  = 0
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] write(29, "%\0\0\0\27\2\0\0\0\0\1\0\0\0\0\1\376\0\376\0\3SOA\20obfusc-domain2.com", 41) = 41
[pid 21655] read(29, "\1\0\0\1\0106\0\0\2\3def\4pdns\7records\7records\7content\7content\f!\0\0\356\2\0\375\0\0\0\0\0.\0\0\3\3def\4pdns\7records\7records\3ttl\3ttl\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\4\3def\4pdns\7records\7records\4prio\4prio\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\5\3def\4pdns\7records\7records\4type\4type\f!\0\36\0\0\0\375\0@\0\0\0:\0\0\6\3def\4pdns\7records\7records\tdomain_id\tdomain_id\f?\0\v\0\0\0\3\10@\0\0\0008\0\0\7\3def\4pdns\7records\7records\10disabled\10disabled\f?\0\1\0\0\0\1\0\0\0\0\0000\0\0\10\3def\4pdns\7records\7records\4name\4name\f!\0\375\2\0\0\375\10@\0\0\0000\0\0\t\3def\4pdns\7records\7records\4auth\4auth\f?\0\1\0\0\0\1\0\0\0\0\0\5\0\0\n\376\0\0\2\0u\0\0\v\0\0\0Nns34.nsdomain.tld root.s15.nsdomain.tld 1537562345 10800 3600 604800 38400\0\226\0\0\0\0\0\0\3SOA8\6\0\0\0\20obfusc-domain2.com\1\5\0\0\f\376\0\0\2\0", 16384) = 582
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] write(29, "\5\0\0\0\32\2\0\0\0", 9) = 9
[pid 21655] read(29, "\7\0\0\1\0\0\0\2\0\0\0", 16384) = 11
[pid 21655] fcntl(29, F_GETFL)          = 0x2 (flags O_RDWR)
[pid 21655] fcntl(29, F_GETFL)          = 0x2 (flags O_RDWR)
[pid 21655] fcntl(29, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 21655] recvfrom(29, 0x7fe49f1c6cc7, 1, MSG_PEEK, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 21655] fcntl(29, F_GETFL)          = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 21655] fcntl(29, F_SETFL, O_RDWR)  = 0
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] write(29, "'\0\0\0\27\2\0\0\0\0\1\0\0\0\0\1\376\0\376\0\3SOA\22obfusc-domain4.com", 43) = 43
[pid 21655] read(29, "\1\0\0\1\0106\0\0\2\3def\4pdns\7records\7records\7content\7content\f!\0\0\356\2\0\375\0\0\0\0\0.\0\0\3\3def\4pdns\7records\7records\3ttl\3ttl\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\4\3def\4pdns\7records\7records\4prio\4prio\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\5\3def\4pdns\7records\7records\4type\4type\f!\0\36\0\0\0\375\0@\0\0\0:\0\0\6\3def\4pdns\7records\7records\tdomain_id\tdomain_id\f?\0\v\0\0\0\3\10@\0\0\0008\0\0\7\3def\4pdns\7records\7records\10disabled\10disabled\f?\0\1\0\0\0\1\0\0\0\0\0000\0\0\10\3def\4pdns\7records\7records\4name\4name\f!\0\375\2\0\0\375\10@\0\0\0000\0\0\t\3def\4pdns\7records\7records\4auth\4auth\f?\0\1\0\0\0\1\0\0\0\0\0\5\0\0\n\376\0\0\2\0w\0\0\v\0\0\0Nns34.nsdomain.tld root.s15.nsdomain.tld 1537230109 10800 3600 604800 38400\0\226\0\0\0\0\0\0\3SOA9\6\0\0\0\22obfusc-domain4.com\1\5\0\0\f\376\0\0\2\0", 16384) = 584
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] write(29, "\5\0\0\0\32\2\0\0\0", 9) = 9
[pid 21655] read(29, "\7\0\0\1\0\0\0\2\0\0\0", 16384) = 11
[pid 21655] fcntl(29, F_GETFL)          = 0x2 (flags O_RDWR)
[pid 21655] fcntl(29, F_GETFL)          = 0x2 (flags O_RDWR)
[pid 21655] fcntl(29, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 21655] recvfrom(29, 0x7fe49f1c6cc7, 1, MSG_PEEK, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 21655] fcntl(29, F_GETFL)          = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 21655] fcntl(29, F_SETFL, O_RDWR)  = 0
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] poll([{fd=29, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 21655] write(29, "#\0\0\0\27\2\0\0\0\0\1\0\0\0\0\1\376\0\376\0\3SOA\16obfusc-domain1.com", 39) = 39
[pid 21655] read(29, "\1\0\0\1\0106\0\0\2\3def\4pdns\7records\7records\7content\7content\f!\0\0\356\2\0\375\0\0\0\0\0.\0\0\3\3def\4pdns\7records\7records\3ttl\3ttl\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\4\3def\4pdns\7records\7records\4prio\4prio\f?\0\v\0\0\0\3\0\0\0\0\0000\0\0\5\3def\4pdns\7records\7records\4type\4type\f!\0\36\0\0\0\375\0@\0\0\0:\0\0\6\3def\4pdns\7records\7records\tdomain_id\tdomain_id\f?\0\v\0\0\0\3\10@\0\0\0008\0\0\7\3def\4pdns\7records\7records\10disabled\10disabled\f?\0\1\0\0\0\1\0\0\0\0\0000\0\0\10\3def\4pdns\7records\7records\4name\4name\f!\0\375\2\0\0\375\10@\0\0\0000\0\0\t\3def\4pdns\7records\7records\4auth\4auth\f?\0\1\0\0\0\1\0\0\0\0\0\5\0\0\n\376\0\0\2\0s\0\0\v\0\0\0Nns34.nsdomain.tld root.s15.nsdomain.tld 1537229316 10800 3600 604800 38400\0\226\0\0\0\0\0\0\3SOA:\6\0\0\0\16obfusc-domain1.com\1\5\0\0\f\376\0\0\2\0", 16384) = 580

this continue indefinitely with different zones.

pdns.conf: config-dir=/etc/powerdns daemon=yes allow-axfr-ips=XXX.XXX.XXX.XXX/32 disable-axfr=no guardian=yes local-address=0.0.0.0 local-port=53 log-dns-details=on loglevel=6 module-dir=/usr/lib64/pdns master=no slave=yes slave-cycle-interval=120 setgid=pdns setuid=pdns socket-dir=/var/run version-string=powerdns launch=gmysql gmysql-host=localhost gmysql-user=pdns gmysql-dbname=pdns gmysql-password=@@@@@@ slave-renotify=yes only-notify= also-notify=XXX.XXX.XXX.XXX webserver=yes webserver-address=XXX.XXX.XXX.XXX webserver-allow-from=XXX.XXX.XXX.XXX api=yes api-key=TTTTTT gmysql-dnssec=yes retrieval-threads=4 receiver-threads=4 signing-threads=4 distributor-threads=4 soa-refresh-default=86000

no DNSSEC enabled on any domain.

george2asenov commented 5 years ago

even bind set with 88 zones and master for PDNS slave when restarted PDNS start looping and using all of the CPU. Also start repeating every second in the log: No new unfresh slave domains, 0 queued for AXFR already, 0 in progress Zone 'domain1.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085 Zone 'domain2.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085 Zone 'domain3.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085 Zone 'domain4.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085 ..... No new unfresh slave domains, 0 queued for AXFR already, 0 in progress ..... Zone 'domain1.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085 Zone 'domain2.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085 Zone 'domain3.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085 Zone 'domain4.com' is on the list of failed SOA checks. Skipping SOA checks until 1552663085

and never stop until restart.

mind04 commented 5 years ago

Please enable query logging in your backend and show us the result.

Since this is a test environment, please verify if this issue is still present in auth-4.2.0 beta1. A lot has changed in this area between 4.1.6 and the 4.2.0 beta. Don't forget to add a supermaster=yes in you config when you try 4.2.0.

Packages are available from https://downloads.powerdns.com/autobuilt_browser/#/auth/4.2.0-beta1

george2asenov commented 5 years ago

Hello,

Here is the log. pdns_server.log This log is made with no access to port 53 from outside except one BIND master (so no DNS queries and no other notifications ). You can see that after the notify and the transfer completes the mysql queries continue with 250 queries per second! I didn't have the time to test with 4.2.0 but I will and will let you know the results.

george2asenov commented 5 years ago

I have tested with the 4.2.0 beta release and it behaves the same. If it receive mass notify it starts repeating almost every second

Mar 26 03:24:05 cent7 pdns_server: Zone 'domain68818.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain41903.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain92984.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain96516.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain60143.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain81484.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain67483.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain09248.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain85737.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain01922.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain56140.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain85547.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain81831.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain94173.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain83262.com' is on the list of failed SOA checks. Skipping SOA checks until 1553588703 Mar 26 03:24:05 cent7 pdns_server: Zone 'domain85207.com' is on the list of failed SOA checks. Skipping SOA checks until 155358870 .... If I activate the Query logging when it stop logging the previous it start repeating this query with the same ID "139677571072384" all the time:

Mar 26 04:57:19 Query 139677571072384: SELECT content,ttl,prio,type,domain_id,disabled,name,auth FROM records WHERE disabled=0 and type=? and name=? Mar 26 04:57:19 Query 139677571072384: 97 usec to execute Mar 26 04:57:19 Query 139677571072384: 135 total usec to last row

then the first one again and repeat.

omoerbeek commented 5 years ago

So far unable to reproduce using a pdns master and slave. Will setup a bind master later.

BTW, the XXX ip addresses in your example config might hide a problem. It could be that self-notification or a notification loop plays a role. We ask people to supply all the details with good reason. See https://blog.powerdns.com/2016/01/18/open-source-support-out-in-the-open/

omoerbeek commented 5 years ago

With bind as a master and 1000 domains I still cannot reproduce. That is, I see a peak in CPU usage with maybe a bit excessive logging, but that lasts only a short time.

The only thing I can advise is tor try to find what is different in your setup to try to zoom in on the actual cause.

The comment about sharing a complete unredacted config still applies.

peterthomassen commented 5 years ago

As it appears to be a bit difficult to arrive at a setup that reproduces this, I created a docker-compose setup that reliably causes this: https://github.com/peterthomassen/pdns-cluster (It's just a standard master-slave setup, so one can also use it for a general pdns docker project.)

Steps to reproduce:

  1. Set things up:
    • git clone https://github.com/peterthomassen/pdns-cluster && cd pdns-cluster/
    • docker-compose build
    • docker-compose up
  2. Then in another terminal, do something like for i in $(seq 1 1000); do echo '{"name": "'$i'.foobar.test.", "kind": "MASTER", "nameservers": ["ns1.dns.test.", "ns2.dns.test."]}' | curl -sS -v -X POST -d@- http://localhost:8081/api/v1/servers/localhost/zones -H "X-API-Key: nsmasterapikey"; done to created a thousand domains
  3. While or after creating the domains, watch the MySQL query log (it is turned on to log to a file in the dbslave container): docker-compose exec dbslave tail -f /var/log/mysql/log

You will see that a large number of Reset stmt will be logged by the database. Once things have calmed down, /var/log/mysql/log will be about 1.5M lines long, most of which will be the Reset stmt queries.

You can trigger this again any time by sending a notify:

  1. for i in $(seq 1 1000); do echo $i; curl -X PUT http://localhost:8081/api/v1/servers/localhost/zones/$i.foobar.test./notify -H "X-API-Key: nsmasterapikey"; done

The number of queries is not linear in the number of zones. My experiments show 220 statements for 10 zones, about 10,000 statements for 100 zones, and about 1,000,000 statements for 1,000 zones. I stopped the measurement for 10,000 zones at some point (although this is my actual use case).

omoerbeek commented 5 years ago

Thanks. I'll investigate further, hopefully soon.

omoerbeek commented 5 years ago

The problem is in CommunicatorClass::mainloop() Each time a notification is received, the d_any_sem is posted (incremented). This triggers a slave out of date check for all slaves for each increment in the (tight) outer loop.

Looking into the best way to fix this.

peterthomassen commented 5 years ago

At the risk of having misunderstood: That sounds like the observed database requests should be SELECT and not Reset, and also their number should be proportional to the number of notifications, right? My investigations further up show different behavior.

omoerbeek commented 5 years ago

I cannot explain the resets yet. I'm using sqlite, but i will install mysql shortly.

The hypothesis is that N notifications lead t N times checking of all slave data. Hence the quadratic behaviour (number of notifications * number of zones). If you notify all zones that's N^2.

Thew stack trace showed the thread to be in getUnfreshSlaveInfos() almost all the time.

omoerbeek commented 5 years ago

With mariadb I'm seeing the selects, but each select is paired with a single reset. I'm not seeing a load of resets at the end as you do....

After some thinking and playing it turned out that the mariadb query cache is causing selects to not appear in the log. If I enable the query cache i'm seeing the same patterns as you do.

Fix for master branch upcoming.

Habbie commented 5 years ago

7790 has a simple fix that we'll release as part of auth 4.2.0. I'll leave this ticket open for a more thorough rework of the logic after 4.2.0.

aldem commented 3 years ago

Not sure if this is same issue, but on pdns slave I am getting 100% CPU usage for ca. 10s after each notify - even for single zone (pdns 4.4.1, gsqlite backend, ca. 192K domains and 1.4M records).

Is there any real reason to check all slave zones freshness when notify is affecting only one (or even some)?