PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.72k stars 912 forks source link

Serial in RPZ dump file not correct #14857

Closed g0tar closed 3 days ago

g0tar commented 5 days ago

Short description

Apparently Recursor lacks the command to notify about RPZ file update. reload-lua-config is all-or-nothing and triggers also IXFR "whole zone" transfers for rpzPrimary() RPZ (why is this happening anyway?) which might be multi-megabytes in size, therefore not suitable for frequent updates.

Usecase

I use 3 rpzPrimary() and 2 rpzFile() sources. The DNS-based ones support IXFR, so I don't need to worry about updates, but RPZ files are cyclically fetched from remote servers (possibly with different frequency) and I need to notify recursor to grab new ones.

Description

I'm looking for a way to frequently update RPZ files without refetching entire rpzPrimary(). These zones can be large, so it would be best if this was possible on file-by-file basis, but as local storage is fast enough it might as well trigger all of them at all (with regard to CPU intensiveness of recursor itself, reloading needlessly files that didn't change). I don't want to saturate RPZ-over-DNS providers though by issuing "whole zone" transfers every few minutes.

I thought that maybe reload-zones would work (after reading https://github.com/PowerDNS/pdns/issues/11490#issuecomment-2134915428), but it's not. Maybe this command could gain some switch for RPZ (reload-zones -rpz)?

As an alternative there could be some new rec_control command to load/unload specified RPZ file without invoking entire LUA config, or some inotify watch installed for reload to happen automagically.

Or maybe simply some way to suppress "whole zone" transfers after reload-lua-config?

Currently I'm thinking about file-to-authoritative substitute approach, where RPZ files are handled by separate pdns, which then is used as rpzPrimary() on the main recursor, but this seems a bit too much work for what it's worth.

omoerbeek commented 5 days ago

reload-lua-config should not reload the full RPZs defined by rpzPrimary (only a SOA check should be triggered followed by a partial IXFR if the other side offers that), but maybe a bug crept in.

I'll try to reproduce. For that I need to know which version of the recursor you are running.

Update: my memory was wrong: I do see a full reload. But if you define a dumpFile and seedFile (of the same name), a reload will be done using the file and then an IXFR request will be done to get updates.

Nov 20 11:13:52 msg="Received rec_control command via control socket" subsystem="control" level="0" prio="Info" tid="0" ts="1732097632.810" command="reload-lua-config"
Nov 20 11:13:52 msg="Processing main YAML settings" level="0" prio="Notice" tid="0" ts="1732097632.810" path="./recursor.yml"
Nov 20 11:13:52 msg="YAML config found and processed" level="0" prio="Notice" tid="0" ts="1732097632.810" configname="./recursor.yml"
Nov 20 11:13:52 msg="Processing YAML settings" level="0" prio="Notice" tid="0" ts="1732097632.810" path="tmp/api-dir/allow-from.yml"
Nov 20 11:13:52 msg="Pre-loading RPZ zone from seed file" level="0" prio="Info" tid="0" ts="1732097632.826" seedfile="tmp/rpz.dump" zone="rpz.local"
Nov 20 11:13:52 msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732097632.957" address="192.168.178.3:1533" ourserial="2024111401" zone="rpz.local"
Nov 20 11:14:30 msg="A more recent configuration has been found, stopping the existing RPZ update thread" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732097670.528" zone="rpz.local"
Nov 20 11:14:52 msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732097692.964" address="192.168.178.3:1533" ourserial="2024111401" zone="rpz.local"
Nov 20 11:15:52 msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732097752.972" address="192.168.178.3:1533" ourserial="2024111401" zone="rpz.local"
Nov 20 11:16:52 msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732097812.980" address="192.168.178.3:1533" ourserial="2024111401" zone="rpz.local"
omoerbeek commented 5 days ago

As for the comment in https://github.com/PowerDNS/pdns/issues/11490#issuecomment-2134915428, I'll add a remark.

g0tar commented 5 days ago

I'm running 5.1.2 and I do have both dumpFile and seedFile set. The log of second invocation (~a minute interval) looks like this:

Nov 20 11:40:56: msg="Received rec_control command via control socket" subsystem="control" level="0" prio="Info" tid="0" ts="1732099256.660" command="reload-lua-config /etc/pdns-recursor/lua.d/0-recursor.lua"
Nov 20 11:40:56: msg="Loading RPZ from file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099256.662" file="/var/lib/pdns-recursor/CERT.rpz"
Nov 20 11:40:57: msg="Done loading RPZ from file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.252" file="/var/lib/pdns-recursor/CERT.rpz"
Nov 20 11:40:57: msg="Loading RPZ from file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.252" file="/var/lib/pdns-recursor/URLhaus.rpz"
Nov 20 11:40:57: msg="Done loading RPZ from file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.254" file="/var/lib/pdns-recursor/URLhaus.rpz"
Nov 20 11:40:57: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.254" seedfile="/var/lib/pdns-recursor/rescure-domains.ioc2rpz" zone="rescure-domains.ioc2rpz"
Nov 20 11:40:57: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.260" seedfile="/var/lib/pdns-recursor/phishtank.ioc2rpz" zone="phishtank.ioc2rpz"
Nov 20 11:40:57: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.490" seedfile="/var/lib/pdns-recursor/blox-malicious.ioc2rpz" zone="blox-malicious.ioc2rpz"
Nov 20 11:40:57: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.508" seedfile="/var/lib/pdns-recursor/bforeai.ioc2rpz" zone="bforeai.ioc2rpz"
Nov 20 11:40:57: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.832" address="94.130.30.123:53" ourserial="1730781480" zone="rescure-domains.ioc2rpz"
Nov 20 11:40:57: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.832" address="94.130.30.123:53" ourserial="1731069480" zone="phishtank.ioc2rpz"
Nov 20 11:40:57: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.832" address="94.130.30.123:53" ourserial="1730781480" zone="blox-malicious.ioc2rpz"
Nov 20 11:40:57: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.832" address="94.130.30.123:53" ourserial="1730781480" zone="bforeai.ioc2rpz"
Nov 20 11:40:57: msg="Reloaded" subsystem="config" level="0" prio="Info" tid="0" ts="1732099257.833"
Nov 20 11:40:57: msg="Processing deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.940" size="1" zone="blox-malicious.ioc2rpz"
Nov 20 11:40:57: msg="IXFR update is a whole new zone" subsystem="rpz" level="0" prio="Warning" tid="0" ts="1732099257.941" zone="blox-malicious.ioc2rpz"
Nov 20 11:40:57: msg="RPZ mutations" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.948" additions="2546" newserial="1731991080" removals="0" zone="blox-malicious.ioc2rpz"
Nov 20 11:40:58: msg="Processing deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.076" size="1" zone="phishtank.ioc2rpz"
Nov 20 11:40:58: msg="IXFR update is a whole new zone" subsystem="rpz" level="0" prio="Warning" tid="0" ts="1732099258.097" zone="phishtank.ioc2rpz"
Nov 20 11:40:58: msg="Processing deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.103" size="1" zone="bforeai.ioc2rpz"
Nov 20 11:40:58: msg="IXFR update is a whole new zone" subsystem="rpz" level="0" prio="Warning" tid="0" ts="1732099258.122" zone="bforeai.ioc2rpz"
Nov 20 11:40:58: msg="RPZ mutations" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.199" additions="31327" newserial="1732077480" removals="0" zone="phishtank.ioc2rpz"
Nov 20 11:40:58: msg="RPZ mutations" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.214" additions="29176" newserial="1731991080" removals="0" zone="bforeai.ioc2rpz"

Let's | grep phishtank:

Nov 20 11:40:57: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.260" seedfile="/var/lib/pdns-recursor/phishtank.ioc2rpz" zone="phishtank.ioc2rpz"
Nov 20 11:40:57: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.832" address="94.130.30.123:53" ourserial="1731069480" zone="phishtank.ioc2rpz"
Nov 20 11:40:58: msg="Processing deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.076" size="1" zone="phishtank.ioc2rpz"
Nov 20 11:40:58: msg="IXFR update is a whole new zone" subsystem="rpz" level="0" prio="Warning" tid="0" ts="1732099258.097" zone="phishtank.ioc2rpz"
Nov 20 11:40:58: msg="RPZ mutations" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.199" additions="31327" newserial="1732077480" removals="0" zone="phishtank.ioc2rpz"

but | grep rescure:

Nov 20 11:40:57: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.254" seedfile="/var/lib/pdns-recursor/rescure-domains.ioc2rpz" zone="rescure-domains.ioc2rpz"
Nov 20 11:40:57: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.832" address="94.130.30.123:53" ourserial="1730781480" zone="rescure-domains.ioc2rpz"

The additions shows total entries count (verified with the DumpFile). tcpdump shows there was 1.5 MB of data transferred from the 94.130.30.123, while all 3 files transferred are 4.3 MB in total, but are altered by Policy.Custom, defcontent; the not-transfered rescure-domains.ioc2rpz has mtime of 2024-11-08 and contains IN CNAME ....

My ioc2rpz.lua, included by dofile('/etc/pdns-recursor/lua.d/ioc2rpz.lua'), contains:

rpzPrimary('94.130.30.123', 'rescure-domains.ioc2rpz', { axfrTimeout='180', ignoreDuplicates=true,
        tsigname='ioc2rpz-net-CENSORED', tsigalgo='hmac-sha256', tsigsecret='CENSORED',
        dumpFile='/var/lib/pdns-recursor/rescure-domains.ioc2rpz',
        seedFile='/var/lib/pdns-recursor/rescure-domains.ioc2rpz',
        defpol=Policy.Custom, defcontent='hole.cert.pl',
        })
--[ some comments
]--
rpzPrimary('94.130.30.123', 'phishtank.ioc2rpz', { axfrTimeout='180', ignoreDuplicates=true,
        tsigname='ioc2rpz-net-CENSORED', tsigalgo='hmac-sha256', tsigsecret='CENSORED',
        dumpFile='/var/lib/pdns-recursor/phishtank.ioc2rpz',
        seedFile='/var/lib/pdns-recursor/phishtank.ioc2rpz',
        defpol=Policy.Custom, defcontent='hole.cert.pl',
        })
--[ more comments
]--

Might this be related to defcontent? Why isn't the first RPZ affected?

/var/lib/pdns-recursor$ head -n3 phishtank.ioc2rpz rescure-domains.ioc2rpz
==> phishtank.ioc2rpz <==
phishtank.ioc2rpz. IN SOA ioc2rpz-srv1.ioc2rpz.net. ioc2rpz.ioc2rpz.com. 1731069480 14400 3600 2592000 3600
zzppmwsuejdnfjg.weebly.com.phishtank.ioc2rpz. 900 IN CNAME hole.cert.pl.
*.zzek.xyz.phishtank.ioc2rpz. 900 IN CNAME hole.cert.pl.

==> rescure-domains.ioc2rpz <==
rescure-domains.ioc2rpz. IN SOA ioc2rpz-srv1.ioc2rpz.net. ioc2rpz.ioc2rpz.com. 1730781480 86400 3600 2592000 600
*.zyglsg.tk.rescure-domains.ioc2rpz. 900 IN CNAME .
*.zver.tech.rescure-domains.ioc2rpz. 900 IN CNAME .
g0tar commented 5 days ago

One observation: journalctl | grep -i ixfr | grep -i rescure shows that the ourserial="1730781480" of this particular RPZ was not changed since used for the first time, so there was no file rewrite happening, while the remaining zones had some updates, ourserial has changed and so the files were rewritten (with Policy.Custom applied).

omoerbeek commented 5 days ago

Once complicating matter here is that the server side can decide to send a whole new zone instead of updates, for it's own reasons. This can happen if, for example, the provided SOA by us is not the most recent previous one as the server only has the most recent delta available.

At first glance this could be happening here, but I need to study your logs a bit more. It would be interesting to see if subsequent updates are incremental or not. I'll try to run some tests here as well.

g0tar commented 5 days ago

This happens with every reload-lua-config incantation, so should not be related to updates, but indeed - I've already noted this about two weeks ago - apparently ourserial is stuck and doesn't updates after IXFR:

Nov 20 11:39:14: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099154.461" seedfile="/var/lib/pdns-recursor/bforeai.ioc2rpz" zone="bforeai.ioc2rpz" Nov 20 11:39:14: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099154.740" address="94.130.30.123:53" ourserial="1730781480" zone="bforeai.ioc2rpz" Nov 20 11:39:15: msg="Processing deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099155.011" size="1" zone="bforeai.ioc2rpz" Nov 20 11:39:15: msg="IXFR update is a whole new zone" subsystem="rpz" level="0" prio="Warning" tid="0" ts="1732099155.027" zone="bforeai.ioc2rpz" Nov 20 11:39:15: msg="RPZ mutations" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099155.120" additions="29176" newserial="1731991080" removals="0" zone="bforeai.ioc2rpz" Nov 20 11:40:57: msg="Pre-loading RPZ zone from seed file" subsystem="luaconfig" level="0" prio="Info" tid="0" ts="1732099257.508" seedfile="/var/lib/pdns-recursor/bforeai.ioc2rpz" zone="bforeai.ioc2rpz" Nov 20 11:40:57: msg="Getting IXFR deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099257.832" address="94.130.30.123:53" ourserial="1730781480" zone="bforeai.ioc2rpz" Nov 20 11:40:58: msg="Processing deltas" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.103" size="1" zone="bforeai.ioc2rpz" Nov 20 11:40:58: msg="IXFR update is a whole new zone" subsystem="rpz" level="0" prio="Warning" tid="0" ts="1732099258.122" zone="bforeai.ioc2rpz" Nov 20 11:40:58: msg="RPZ mutations" subsystem="rpz" level="0" prio="Info" tid="0" ts="1732099258.214" additions="29176" newserial="1731991080" removals="0" zone="bforeai.ioc2rpz"

Back then I've seen this with 250 MB Spamhaus RPZ and forgot about this, considering this is some server glitch with incremental updates.

omoerbeek commented 5 days ago

Hi, ourserial not matching the last newserial is fishy indeed.

g0tar commented 4 days ago

I can also confirm, that dump-rpz file contains old serial matching the dumpFile one (dumped automatically), so it's not a kind of race between updating serial and saving the file, but apparently consistently wrong internal state.

omoerbeek commented 4 days ago

I analyzed the error, and should have a fix by tomorrow, including some internal consistency checking so this can be diagnosed more easily if it happens again.