PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.67k stars 906 forks source link

Slave pdns server does not change master upon termination #3602

Open oridistor opened 8 years ago

oridistor commented 8 years ago

I have a slave pdns with bind backend listening to several different masters for SOA updates and notifications. If the server is on and the first IP in the list is terminated, my slave pdns can't connect to it to get SOA updates even though he gets notified by other masters that a change need to be made.

This obviously gets solved if I restart my pdns server, in which case it takes the next ip in list, but the connection is always only kept with the first master.

I would like for the slave to either:

I'll only give slave information as they are what's relevant:

named.conf file:

options {
    directory "/var/powerdns/bind";
};

zone "example.com" IN {
    type slave;
    file "example.com";
    masters { 52.91.135.125; 54.175.233.140; 54.208.97.41; };
};

pdns.conf file:

config-dir=/etc/powerdns
daemon=yes
guardian=yes
disable-axfr=yes
disable-tcp=yes
local-port=53
log-dns-details=on
loglevel=3
master=no
slave=yes
setgid=pdns
setuid=pdns
socket-dir=/var/run
version-string=powerdns
launch=bind
bind-config=/var/powerdns/bind/named.conf
include-dir=/etc/powerdns/pdns.conf.d
bind-check-interval=600
allow-notify-from=0.0.0.0/0
distributor-threads=3
negquery-cache-ttl=10
query-cache-ttl=20
receiver-threads=1
retrieval-threads=2
slave-cycle-interval=60

Thanks in advance

pieterlexis commented 8 years ago

Updated post with code tags.

Can you please share your OS and PowerDNS version number? ANd are there any files in /etc/powerdns/pdns.conf.d?

oridistor commented 8 years ago

os - ubuntu 14.04 - trusty.

PowerDNS version - 3.4.7

oridistor commented 8 years ago

A file called "allow-notify-from" with an 'allow-notify-from=' tag with a lot of IPs, including the relevant ones that are up.

pieterlexis commented 8 years ago

We should try the master the NOTIFY was received from, so if it goes down jsut after sending the NOTIFY, this could be an issue. But I assume this isn't the case here. Can you increase your loglevel to 6 and provide the logs of when this happens?

oridistor commented 8 years ago

will do right away.

Do notice that the NOTIFY goes to the same server, even if that server is down. As I do dns changes on masters side after the first master is down, I can say for certain that in my case, the request is sent to first master and not the one it recieved the notify from.

oridistor commented 8 years ago
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: 2 slave domains need checking, 0 queued for AXFR
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Received serial number updates for 2 zones, had 0 timeouts
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Domain 'cluster1.frontendtest.redislabs.com' is stale, master serial 27, our serial 26
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Initiating transfer of 'cluster1.frontendtest.redislabs.com' from remote '54.88.105.229'
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Domain 'internal.cluster1.frontendtest.redislabs.com' is stale, master serial 21, our serial 20
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Initiating transfer of 'internal.cluster1.frontendtest.redislabs.com' from remote '54.88.105.229'
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: 10 slave domains need checking, 0 queued for AXFR
Mar 22 09:46:19 ip-10-147-238-59 pdns[22590]: Received serial number updates for 0 zones, had 10 timeouts
Mar 22 09:46:25 ip-10-147-238-59 pdns[22590]: Unable to AXFR zone 'internal.cluster1.frontendtest.redislabs.com' from remote '54.88.105.229' (resolver): Timeout connecting to server
Mar 22 09:46:25 ip-10-147-238-59 pdns[22590]: Unable to AXFR zone 'cluster1.frontendtest.redislabs.com' from remote '54.88.105.229' (resolver): Timeout connecting to server

Do notice, the 10 zones that need checking are zones that are completely terminated and weren't purged. Yet, you can see the issue with the zone.

We see that the problem is that it doesn't send a notify, but while when checking SOA change we ask all the masters if a change was made, if we decide it was, we target the first master. The solution should be to ask the master that for which it saw the SOA change.

oridistor commented 8 years ago

Although what I wrote before and what's written in the log, it's obvious that this error occurs after a notify, that for some reason isn't logged. We can see the notify in the fact that it happens right after I do the action on the slaves that causes that message, with almost no lag.

oridistor commented 8 years ago

https://github.com/PowerDNS/pdns/pull/3631

This is a quick fix I've made for this problem. I don't think it's the best fix, but as we use bind as backend, this is the best I can get.