Closed Habbie closed 11 years ago
Attachment '' (pdns-syslog) https://gist.github.com/5466738
Attachment '' (zone) https://gist.github.com/5466739
Attachment 'supermaster anti-race workaround' (supermaster-race.patch) https://gist.github.com/5466740
Author: anon Ouch. I'll attach the log as a file.
Author: peter supermaster-race.patch should limit the impact of this issue. It is not a real fix, though.
Author: ahu This patch has been applied, and effectively solves the issue.
Author: anon I am having the same issue with 3.1, so I do not believe this is fixed.
I have a server setup as a slave which receives updates from a supermaster. Updates to existing domains appear to work ok, however if a new zone is created on the supermaster, it crashes the slave when the slave commits the new zone, see below:
Jan 19 16:31:08 ns2 pdns[24478]: AXFR started for 'example.net', transaction started Jan 19 16:31:08 ns2 pdns[24478]: Exiting because communicator thread died with error: Database error trying to insert new slave 'example.net': Failed to execute mysql_query, perhaps connection died? Err=1: Duplicate entry 'example.net' for key 'name_index' Jan 19 16:31:08 ns2 pdns[24478]: AXFR done for 'example.net', zone committed with serial number 2013011902
pdns on the slave is dead at that point and needs to be started again.
upon looking at the mysql database, I can see that the domains and records tables have been populated with the correct data, so it does indeed seem like a race of sorts.
Any ideas on how to fix the issue?
Author: peter Hello,
the fix is in 3.2, not in 3.1. Can you try with 3.1?
Author: peter Uh. I meant - can you try with 3.2? :)
Author: anon Hi Peter,
I've tried 3.2-rc4 (from freebsd ports). The same error is there but at least the server doesnt crash. The conversation looks a bit messy! There is a subsequent axfr because the version of the SOA on the slave is 0??
Jan 20 16:08:45 ns2 pdns[51746]: Done launching threads, ready to distribute questions Jan 20 16:15:39 ns2 pdns[51746]: Received NOTIFY for tittybumbum.com from 65.19.176.90 for which we are not authoritative Jan 20 16:15:39 ns2 pdns[51746]: Received NOTIFY for tittybumbum.com from 2001:470:1:203::1002 for which we are not authoritative Jan 20 16:15:40 ns2 pdns[51746]: Created new slave zone 'tittybumbum.com' from supermaster 65.19.176.90, queued axfr Jan 20 16:15:40 ns2 pdns[51746]: Initiating transfer of 'tittybumbum.com' from remote '65.19.176.90' Jan 20 16:15:40 ns2 pdns[51746]: Database error trying to create tittybumbum.com for potential supermaster 2001:470:1:203::1002: Database error trying to insert new slave 'tittybumbum.com': Failed to execute mysql_query, perhaps connection died? Err=1: Duplicate entry 'tittybumbum.com' for key 'name_index' Jan 20 16:15:40 ns2 pdns[51746]: 1 slave domain needs checking, 0 queued for AXFR Jan 20 16:15:40 ns2 pdns[51746]: AXFR started for 'tittybumbum.com', transaction started Jan 20 16:15:40 ns2 pdns[51746]: Received serial number updates for 1 zones, had 0 timeouts Jan 20 16:15:40 ns2 pdns[51746]: Domain 'tittybumbum.com' is stale, master serial 2013012002, our serial 0 Jan 20 16:15:40 ns2 pdns[51746]: Initiating transfer of 'tittybumbum.com' from remote '65.19.176.90' Jan 20 16:15:40 ns2 pdns[51746]: 1 slave domain needs checking, 0 queued for AXFR Jan 20 16:15:40 ns2 pdns[51746]: Received serial number updates for 1 zones, had 0 timeouts Jan 20 16:15:40 ns2 pdns[51746]: Domain 'tittybumbum.com' is stale, master serial 2013012002, our serial 0 Jan 20 16:15:40 ns2 pdns[51746]: AXFR started for 'tittybumbum.com', transaction started Jan 20 16:15:40 ns2 pdns[51746]: AXFR done for 'tittybumbum.com', zone committed with serial number 2013012002 Jan 20 16:15:40 ns2 pdns[51746]: Initiating transfer of 'tittybumbum.com' from remote '65.19.176.90' Jan 20 16:15:40 ns2 pdns[51746]: AXFR started for 'tittybumbum.com', transaction started Jan 20 16:15:40 ns2 pdns[51746]: AXFR done for 'tittybumbum.com', zone committed with serial number 2013012002 Jan 20 16:15:40 ns2 pdns[51746]: AXFR done for 'tittybumbum.com', zone committed with serial number 2013012002 Jan 20 16:15:42 ns2 pdns[51746]: Received NOTIFY for tittybumbum.com from 2001:470:1:203::1002 which is not a master Jan 20 16:15:56 ns2 last message repeated 2 times
Author: anon both supermaster and slave are running the same version of powerdns.
Author: anon there seems to be some ipv4 and ipv6 weirdness in that log...
ie:
a) Received NOTIFY for tittybumbum.com from 2001:470:1:203::1002 for which we are not authoritative
b) Created new slave zone 'tittybumbum.com' from supermaster 65.19.176.90, queued axfr Jan 20 16:15:40
Received notify via ipv6, axfr done via ipv4..
c) Database error trying to create tittybumbum.com for potential supermaster 2001:470:1:203::1002:
There seems to be a bit of flip flopping going on between v4 and v6.
Author: peter There is no flipflopping - it receives both notifies but can only act on one. The initial v4 notify manages to trigger the supermaster entry's creation, which means the v6 will be ignored for that domain from now on.
The 0 serial means "domain not present yet", this is normal before the first AXFR.
And finally, 'not crashing' is what the fix for this issue promised. Avoiding all race conditions in this area is hard; not crashing is what we currently have to offer for this.
Thank you for checking! Closing ticket.
Author: anon I think perhaps using MySQL replication might be a better option. Thanks.
Author: peter Probably cleanest for now. Ticket #506 is tracking the general messiness of dual-stack slaving, by the way.
Hi,
running a local compile of PowerDNS 3.1 RC2 20120420.2574 as a slave with gmysql backend. Master is the same software, and a script creates quite some traffic on the database.
The slave sporadically dies with "Duplicate entry 'ifojedaijee.example' for key 'name_index'". Log entries, grepped for the domain name in question:
root@pdns-debug:/home/mhaber# grep ifojedaijee.example /var/log/syslog Apr 24 14:24:43 pdns-debug pdns-pdns-gmysql-master[10622]: Queued notification of domain 'ifojedaijee.example' to 192.168.1.1 Apr 24 14:24:43 pdns-debug pdns-pdns-gmysql-master[10622]: Queued notification of domain 'ifojedaijee.example' to 192.168.1.2 Apr 24 14:25:20 pdns-debug pdns-pdns-gmysql-slave[10610]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 for which we are not authoritative Apr 24 14:25:22 pdns-debug pdns-pdns-gmysql-master[10622]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 which is not a master Apr 24 14:25:23 pdns-debug pdns-pdns-gmysql-slave[10610]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 for which we are not authoritative Apr 24 14:25:25 pdns-debug pdns-pdns-gmysql-master[10622]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 which is not a master Apr 24 14:25:28 pdns-debug pdns-pdns-gmysql-slave[10610]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 for which we are not authoritative Apr 24 14:25:30 pdns-debug pdns-pdns-gmysql-master[10622]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 which is not a master Apr 24 14:25:35 pdns-debug pdns-pdns-gmysql-slave[10610]: Created new slave zone 'ifojedaijee.example' from supermaster 192.168.1.1, queued axfr Apr 24 14:25:36 pdns-debug pdns-pdns-gmysql-slave[10610]: Initiating transfer of 'ifojedaijee.example' from remote '192.168.1.1' Apr 24 14:25:36 pdns-debug pdns-pdns-gmysql-master[10622]: AXFR of domain 'ifojedaijee.example' initiated by 192.168.1.2 Apr 24 14:25:36 pdns-debug pdns-pdns-gmysql-master[10622]: AXFR of domain 'ifojedaijee.example' allowed: client IP 192.168.1.2 is in allow-axfr-ips Apr 24 14:25:37 pdns-debug pdns-pdns-gmysql-slave[10610]: AXFR started for 'ifojedaijee.example', transaction started Apr 24 14:25:37 pdns-debug pdns-pdns-gmysql-slave[10610]: AXFR done for 'ifojedaijee.example', zone committed with serial number 3 Apr 24 14:25:37 pdns-debug pdns-pdns-gmysql-master[10622]: AXFR of domain 'ifojedaijee.example' to 192.168.1.2 finished Apr 24 14:25:39 pdns-debug pdns-pdns-gmysql-master[10622]: Removed from notification list: 'ifojedaijee.example' to 192.168.1.2:53 (was acknowledged) Apr 24 14:25:43 pdns-debug pdns-pdns-gmysql-master[10622]: Received unsuccessful notification report for 'ifojedaijee.example' from 192.168.1.1:53, rcode: 5 Apr 24 14:25:43 pdns-debug pdns-pdns-gmysql-master[10622]: Removed from notification list: 'ifojedaijee.example' to 192.168.1.1:53 Apr 24 14:25:51 pdns-debug pdns-pdns-gmysql-master[10622]: Received spurious notify answer for 'ifojedaijee.example' from 192.168.1.2:53 Apr 24 14:26:00 pdns-debug pdns-pdns-gmysql-master[10622]: Received unsuccessful notification report for 'ifojedaijee.example' from 192.168.1.1:53, rcode: 5 Apr 24 14:26:00 pdns-debug pdns-pdns-gmysql-master[10622]: Received spurious notify answer for 'ifojedaijee.example' from 192.168.1.1:53 Apr 24 14:26:02 pdns-debug pdns-pdns-gmysql-master[10622]: Received spurious notify answer for 'ifojedaijee.example' from 192.168.1.2:53 Apr 24 14:26:22 pdns-debug pdns-pdns-gmysql-slave[10610]: Communicator thread died because of error: Database error trying to insert new slave 'ifojedaijee.example': Failed to execute mysql_query, perhaps connection died? Err=1: Duplicate entry 'ifojedaijee.example' for key 'name_index' root@pdns-debug:/home/mhaber#
in this (lab) case, Master and Slave are running on the same box (but on different IP addresses and on different databases, albeit in the same MySQL instance).
The Master continues to run just fine, so I suggest that there is no duplicate entry. Maybe the slave struggles about being notified three times in a row?
Please also notice that the MySQL error happens like ~ 30 seconds after the domain was touched by the slave for the last time, and 20 seconds after the domain was touched for the last time at all.