PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.62k stars 904 forks source link

rec: RPZs use more memory than strictly needed #13827

Closed ousatov-ua closed 6 months ago

ousatov-ua commented 6 months ago

Short description

I'm using RPZ blocklists:

rpzFile("/opt/powerdns/blocklists/oisd-nsfw.rpz", {ignoreDuplicates = true})
rpzFile("/opt/powerdns/blocklists/hagezy-anti-privacy.rpz", {ignoreDuplicates = true})
rpzFile("/opt/powerdns/blocklists/hagezy-gambling.rpz", {ignoreDuplicates = true})
rpzFile("/opt/powerdns/blocklists/hagezy-multi-normal.rpz", {ignoreDuplicates = true})
rpzFile("/opt/powerdns/blocklists/hagezy-no-safe-search.rpz", {ignoreDuplicates = true})
rpzFile("/opt/powerdns/blocklists/hagezy-threat.rpz", {ignoreDuplicates = true})

Those rpz blocklists are downloaded periodically with next script:

curl -o "/opt/powerdns/blocklists/_hagezy-multi-normal.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/multi.txt"
curl -o "/opt/powerdns/blocklists/_hagezy-gambling.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/gambling.txt"
curl -o "/opt/powerdns/blocklists/_oisd-nsfw.rpz" "https://nsfw.oisd.nl/rpz"
curl -o "/opt/powerdns/blocklists/_hagezy-anti-privacy.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/anti.piracy.txt"
curl -o "/opt/powerdns/blocklists/_hagezy-no-safe-search.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/nosafesearch.txt"
curl -o "/opt/powerdns/blocklists/_hagezy-threat.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/tif.txt"

mv /opt/powerdns/blocklists/_hagezy-multi-normal.rpz /opt/powerdns/blocklists/hagezy-multi-normal.rpz
mv /opt/powerdns/blocklists/_hagezy-gambling.rpz /opt/powerdns/blocklists/hagezy-gambling.rpz
mv /opt/powerdns/blocklists/_oisd-nsfw.rpz /opt/powerdns/blocklists/oisd-nsfw.rpz
mv /opt/powerdns/blocklists/_hagezy-anti-privacy.rpz /opt/powerdns/blocklists/hagezy-anti-privacy.rpz
mv /opt/powerdns/blocklists/_hagezy-no-safe-search.rpz /opt/powerdns/blocklists/hagezy-no-safe-search.rpz
mv /opt/powerdns/blocklists/_hagezy-threat.rpz /opt/powerdns/blocklists/hagezy-threat.rpz

rec_control --timeout=60 reload-lua-config /etc/powerdns/blocklists.lua

My observation: memory usage is increasing on 500 Mb for two runs: Initial start: ~ 500Mb 1st reload rpz: ~ 1000Mb 2nd reload rpz: ~ 1500Mb

Nest reloads does not increase memory consumption.

Environment

Pdns-recursor is not compiled by myself, used repositories. Behavior observed on different systems: Ubuntu, OL9, Debian. Current observation on Debian:

Linux 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux 

Recursor:

Feb 25 07:07:45 PowerDNS Recursor 5.0.2 (C) PowerDNS.COM BV
Feb 25 07:07:45 Using 64-bits mode. Built using gcc 10.2.1 20210110 on Feb 13 2024 12:55:34 by root@localhost.
Feb 25 07:07:45 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
Feb 25 07:07:45 Features: fcontext libcrypto-ecdsa libcrypto-ed25519 libcrypto-ed448 libcrypto-eddsa lua nod protobuf dnstap-framestream snmp sodium curl DoT scrypt
Feb 25 07:07:45 Configured with: " '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--sysconfdir=/etc/powerdns' '--enable-systemd' '--with-systemd=/lib/systemd/system' '--enable-unit-tests' '--disable-silent-rules' '--with-service-user=pdns' '--with-service-group=pdns' '--with-libcap' '--with-libsodium' '--with-lua' '--with-net-snmp' '--enable-dns-over-tls' '--enable-dnstap' '--enable-nod' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/pdns/pdns-recursor-5.0.2=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -ffile-prefix-map=/pdns/pdns-recursor-5.0.2=. -fstack-protector-strong -Wformat -Werror=format-security'"

Steps to reproduce

  1. Define RPZs load in lua:
    rpzFile("/opt/powerdns/blocklists/oisd-nsfw.rpz", {ignoreDuplicates = true})
    rpzFile("/opt/powerdns/blocklists/hagezy-anti-privacy.rpz", {ignoreDuplicates = true})
    rpzFile("/opt/powerdns/blocklists/hagezy-gambling.rpz", {ignoreDuplicates = true})
    rpzFile("/opt/powerdns/blocklists/hagezy-multi-normal.rpz", {ignoreDuplicates = true})
    rpzFile("/opt/powerdns/blocklists/hagezy-no-safe-search.rpz", {ignoreDuplicates = true})
    rpzFile("/opt/powerdns/blocklists/hagezy-threat.rpz", {ignoreDuplicates = true})
  2. Reload them:
    
    #!/bin/bash

curl -o "/opt/powerdns/blocklists/_hagezy-multi-normal.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/multi.txt" curl -o "/opt/powerdns/blocklists/_hagezy-gambling.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/gambling.txt" curl -o "/opt/powerdns/blocklists/_oisd-nsfw.rpz" "https://nsfw.oisd.nl/rpz" curl -o "/opt/powerdns/blocklists/_hagezy-anti-privacy.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/anti.piracy.txt" curl -o "/opt/powerdns/blocklists/_hagezy-no-safe-search.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/nosafesearch.txt" curl -o "/opt/powerdns/blocklists/_hagezy-threat.rpz" "https://raw.githubusercontent.com/hagezi/dns-blocklists/main/rpz/tif.txt"

mv /opt/powerdns/blocklists/_hagezy-multi-normal.rpz /opt/powerdns/blocklists/hagezy-multi-normal.rpz mv /opt/powerdns/blocklists/_hagezy-gambling.rpz /opt/powerdns/blocklists/hagezy-gambling.rpz mv /opt/powerdns/blocklists/_oisd-nsfw.rpz /opt/powerdns/blocklists/oisd-nsfw.rpz mv /opt/powerdns/blocklists/_hagezy-anti-privacy.rpz /opt/powerdns/blocklists/hagezy-anti-privacy.rpz mv /opt/powerdns/blocklists/_hagezy-no-safe-search.rpz /opt/powerdns/blocklists/hagezy-no-safe-search.rpz mv /opt/powerdns/blocklists/_hagezy-threat.rpz /opt/powerdns/blocklists/hagezy-threat.rpz

rec_control --timeout=60 reload-lua-config /etc/powerdns/blocklists.lua



### Expected behaviour
Memory consumption should not be increased by sequential reload of RPZ.

### Actual behaviour
Memory consumption is increasing by next two reloads of RPZ
omoerbeek commented 6 months ago

As you note this is not a real memory leak, the memory usage does not keep increasing continuously.

It cannot be completely avoided that there are moments with both the old and the new RPZ in memory: only when the new load succeeds, the old one will be replaced. Now that replacement does not seem to happen immediately:

What I suspect is happening: the RPZ loading mechanism loads an RPZ into memory and each thread gets a reference to that loaded RPZ. When a reload happens, each thread gets a new reference, but the old reference will only be dropped later when some activity occurs in that thread. This can take a while, because not all threads are very busy all the time.

It's on my list of things to study to see if this can be improved. But as it is not long term keep on growing memory leak, the priority is not very high.

There is also room for improvement with respect to to the amount of memory used by the RPZ entries themselves.

ousatov-ua commented 6 months ago

Hi! Thank you for your reply!

The RPZs are loaded ~ 1 time in a day. Old RPZs in memory still sitting there after 12 hours. The TTL for old RPZs are 1 day, I will observe what will happen later.

I hope pdns-recursor will throw away old RPZs after 1 day period, maybe it is based on TTL of RPZs?

omoerbeek commented 6 months ago

I don't think the TTL is involved here: what happens is that one ore more threads keep a reference to the old RPZ longer than needed. That was my conclusion when I last looked at this. I did not find the actual root cause that time.

What complicates matters: when studying memory usage the usage numbers are hard to interpret. It can happen that memory is actually freed but still visible as allocated to the process. In that case it's immediately available for re-use.

ousatov-ua commented 6 months ago

Thanks! It would be nice to resolve it: memory consumption is very depended on RPZs, if they are pretty large it may cause that someone will skip using pdns-recursor just because of lack of RAM for such behavior... BTW, thank you for such a good recursor! :)

omoerbeek commented 6 months ago

After some study and fresh look at things, I was able to produce #13830, which reduces memory used by RPZ by about 40% and also avoids keeping an extra copy of any RPZ initially loaded in mem.

ousatov-ua commented 6 months ago

Hi @omoerbeek! Awesome news!

ousatov-ua commented 6 months ago

@omoerbeek I'm building your fix locally, will try to use it and check it in my "real life" :)

ousatov-ua commented 6 months ago

@omoerbeek Maybe you can merge it to master? Or somebody else has to do it who has such rights?

ousatov-ua commented 6 months ago

@omoerbeek Hi! Just tested your fix:

Start of recursor: ~370 Mb First reload of same blocklists: ~ 650 Mb Second reload of same blocklists: ~ 1Gb Third reload of same blockllists: ~ 1Gb All next reloads does not change memory consumption: ~ 1Gb

omoerbeek commented 6 months ago

You don't mention what kind of memory usage metric you use (there are many). The numbers I gave are on two OpenBSD systems (one amd64, one arm64), listing both resident and virtual size. OpenBSD allocation routines are pretty aggressive in returning freed memory to the OS.

The default allocator on Linux (gibc) does give memory back to the OS as much. So that's possibly the reason you're still seeing growth in memory use. When reloading there is a short time when both the old and the new RPZs are in memory, resulting in peak usage of 1Gb, which is lower than the number you initially saw.

Still a bit puzzled why you are seeing growth after the 1st reload though. I'll conduct a few more test on Linux trying to understand that before merge.

ousatov-ua commented 6 months ago

Thank you, your are right! I just check memory usage by checking via systemctl:

systemctl status pdns-recursor

omoerbeek commented 6 months ago

I did a few test and I'm seeing the same behavior as you in Linux: after few reload the memory use stabilizes. So the allocation behavior is a bit different compared to OpenBSD, but the end result is still much better than it was.

ousatov-ua commented 6 months ago

@omoerbeek Yes, much better! Thank you!