PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.73k stars 915 forks source link

memleak in pdns auth, even in 4.1.0-rc1 with PR 5777 applied #5782

Closed foxxx0 closed 7 years ago

foxxx0 commented 7 years ago

Short description

It seems that pdns auth is leaking like a sieve when performing lots of AXFR queries. (Even with a very small amount of zones and records.)

Environment

Steps to reproduce

I am not sure if at least one dnssec-signed zone is a requirement for this.

  1. start pdns_server
  2. perform lots of AXFR requests, e.g. using the following snippet: for i in {1..10}; do pdnsutil list-all-zones master 2>/dev/null | while read -r zone; do dig axfr "$zone" @localhost &>/dev/null; done; done
  3. watch the memory consumption rise quickly and significantly (~40 MiB for me with each iteration of the above loop)

Expected behaviour

Seeing the memory consumption level out at some point, not increasing any further.

Actual behaviour

Memory consumption of pdns_server keeps rising and rising, so it's likely leaking somewhere.

Other information

Some stats for my setup: Zones: 69 (including 1 dnssec enabled zone) Records: 1292 (The whole PostgreSQL database is less than 10 MiB in total...)

pdns_server logfile run through valgrind: https://paste.foxxx0.de/C87T/

pdns.conf: https://paste.foxxx0.de/jDZd/

Metronome URL: https://metronome1.powerdns.com/?server=pdns.foxxx0.auth&beginTime=-86400

foxxx0 commented 7 years ago

Update after some more testing in the IRC channel:

Apparently this memleak might even be reprocued by continuously doing AXFR for the same zone. And this zone doesn't need to be dnssec secured.

Some first ideas in IRC were: This is caused by the SQL backends (I'm running PostgreSQL and another guy was able to reproduce this leak with MySQL).

If I understand the current procedure for AXFR correctly, a new SQL connection is opened upon each and every AXFR request. So maybe this is the actual source of the leak.

foxxx0 commented 7 years ago

Just because I was curious and had no idea if it would have any influence whatsoever, I tried something:

Starting pdns_server with LD_PRELOAD=/usr/lib/libjemalloc.so fixes the leak. I have now run multiple thousand AXFR queries, without any significant increase in memory consumption (roughly 1 MiB with jemalloc compared to >40 MiB without jemalloc).

Now I leave it up to the pros to make some sense out of that :)

foxxx0 commented 7 years ago

AXFR stresstest using jemalloc:

stresstest with jemalloc

And the same stresstest using the default libc allocator:

stresstest using the default libc allocator

I think you can spot the difference quite easily.

foxxx0 commented 7 years ago

That might be interesting: https://www.zerotier.com/blog/2017-05-05-theleak.shtml

rgacogne commented 7 years ago

For the record, it looks like it's a bug in glibc's 2.26 per-thread cache:

foxxx0 commented 7 years ago

I have successfully confirmed that a glibc version including that commit indeed fixes the leak completely.

Thanks to all for your cooperation and explanations over in oftc#powerdns!