Open oridistor opened 8 years ago
Updated post with code tags.
Can you please share your OS and PowerDNS version number? ANd are there any files in /etc/powerdns/pdns.conf.d
?
os - ubuntu 14.04 - trusty.
PowerDNS version - 3.4.7
A file called "allow-notify-from" with an 'allow-notify-from=' tag with a lot of IPs, including the relevant ones that are up.
We should try the master the NOTIFY was received from, so if it goes down jsut after sending the NOTIFY, this could be an issue. But I assume this isn't the case here. Can you increase your loglevel
to 6 and provide the logs of when this happens?
will do right away.
Do notice that the NOTIFY goes to the same server, even if that server is down. As I do dns changes on masters side after the first master is down, I can say for certain that in my case, the request is sent to first master and not the one it recieved the notify from.
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: 2 slave domains need checking, 0 queued for AXFR
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Received serial number updates for 2 zones, had 0 timeouts
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Domain 'cluster1.frontendtest.redislabs.com' is stale, master serial 27, our serial 26
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Initiating transfer of 'cluster1.frontendtest.redislabs.com' from remote '54.88.105.229'
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Domain 'internal.cluster1.frontendtest.redislabs.com' is stale, master serial 21, our serial 20
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: Initiating transfer of 'internal.cluster1.frontendtest.redislabs.com' from remote '54.88.105.229'
Mar 22 09:46:15 ip-10-147-238-59 pdns[22590]: 10 slave domains need checking, 0 queued for AXFR
Mar 22 09:46:19 ip-10-147-238-59 pdns[22590]: Received serial number updates for 0 zones, had 10 timeouts
Mar 22 09:46:25 ip-10-147-238-59 pdns[22590]: Unable to AXFR zone 'internal.cluster1.frontendtest.redislabs.com' from remote '54.88.105.229' (resolver): Timeout connecting to server
Mar 22 09:46:25 ip-10-147-238-59 pdns[22590]: Unable to AXFR zone 'cluster1.frontendtest.redislabs.com' from remote '54.88.105.229' (resolver): Timeout connecting to server
Do notice, the 10 zones that need checking are zones that are completely terminated and weren't purged. Yet, you can see the issue with the zone.
We see that the problem is that it doesn't send a notify, but while when checking SOA change we ask all the masters if a change was made, if we decide it was, we target the first master. The solution should be to ask the master that for which it saw the SOA change.
Although what I wrote before and what's written in the log, it's obvious that this error occurs after a notify, that for some reason isn't logged. We can see the notify in the fact that it happens right after I do the action on the slaves that causes that message, with almost no lag.
https://github.com/PowerDNS/pdns/pull/3631
This is a quick fix I've made for this problem. I don't think it's the best fix, but as we use bind as backend, this is the best I can get.
I have a slave pdns with bind backend listening to several different masters for SOA updates and notifications. If the server is on and the first IP in the list is terminated, my slave pdns can't connect to it to get SOA updates even though he gets notified by other masters that a change need to be made.
This obviously gets solved if I restart my pdns server, in which case it takes the next ip in list, but the connection is always only kept with the first master.
I would like for the slave to either:
I'll only give slave information as they are what's relevant:
named.conf file:
pdns.conf file:
Thanks in advance