PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.67k stars 907 forks source link

Duplicate entry 'ifojedaijee.example' for key 'name_index' #453

Closed Habbie closed 11 years ago

Habbie commented 11 years ago

Hi,

running a local compile of PowerDNS 3.1 RC2 20120420.2574 as a slave with gmysql backend. Master is the same software, and a script creates quite some traffic on the database.

The slave sporadically dies with "Duplicate entry 'ifojedaijee.example' for key 'name_index'". Log entries, grepped for the domain name in question:

root@pdns-debug:/home/mhaber# grep ifojedaijee.example /var/log/syslog Apr 24 14:24:43 pdns-debug pdns-pdns-gmysql-master[10622]: Queued notification of domain 'ifojedaijee.example' to 192.168.1.1 Apr 24 14:24:43 pdns-debug pdns-pdns-gmysql-master[10622]: Queued notification of domain 'ifojedaijee.example' to 192.168.1.2 Apr 24 14:25:20 pdns-debug pdns-pdns-gmysql-slave[10610]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 for which we are not authoritative Apr 24 14:25:22 pdns-debug pdns-pdns-gmysql-master[10622]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 which is not a master Apr 24 14:25:23 pdns-debug pdns-pdns-gmysql-slave[10610]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 for which we are not authoritative Apr 24 14:25:25 pdns-debug pdns-pdns-gmysql-master[10622]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 which is not a master Apr 24 14:25:28 pdns-debug pdns-pdns-gmysql-slave[10610]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 for which we are not authoritative Apr 24 14:25:30 pdns-debug pdns-pdns-gmysql-master[10622]: Received NOTIFY for ifojedaijee.example from 192.168.1.1 which is not a master Apr 24 14:25:35 pdns-debug pdns-pdns-gmysql-slave[10610]: Created new slave zone 'ifojedaijee.example' from supermaster 192.168.1.1, queued axfr Apr 24 14:25:36 pdns-debug pdns-pdns-gmysql-slave[10610]: Initiating transfer of 'ifojedaijee.example' from remote '192.168.1.1' Apr 24 14:25:36 pdns-debug pdns-pdns-gmysql-master[10622]: AXFR of domain 'ifojedaijee.example' initiated by 192.168.1.2 Apr 24 14:25:36 pdns-debug pdns-pdns-gmysql-master[10622]: AXFR of domain 'ifojedaijee.example' allowed: client IP 192.168.1.2 is in allow-axfr-ips Apr 24 14:25:37 pdns-debug pdns-pdns-gmysql-slave[10610]: AXFR started for 'ifojedaijee.example', transaction started Apr 24 14:25:37 pdns-debug pdns-pdns-gmysql-slave[10610]: AXFR done for 'ifojedaijee.example', zone committed with serial number 3 Apr 24 14:25:37 pdns-debug pdns-pdns-gmysql-master[10622]: AXFR of domain 'ifojedaijee.example' to 192.168.1.2 finished Apr 24 14:25:39 pdns-debug pdns-pdns-gmysql-master[10622]: Removed from notification list: 'ifojedaijee.example' to 192.168.1.2:53 (was acknowledged) Apr 24 14:25:43 pdns-debug pdns-pdns-gmysql-master[10622]: Received unsuccessful notification report for 'ifojedaijee.example' from 192.168.1.1:53, rcode: 5 Apr 24 14:25:43 pdns-debug pdns-pdns-gmysql-master[10622]: Removed from notification list: 'ifojedaijee.example' to 192.168.1.1:53 Apr 24 14:25:51 pdns-debug pdns-pdns-gmysql-master[10622]: Received spurious notify answer for 'ifojedaijee.example' from 192.168.1.2:53 Apr 24 14:26:00 pdns-debug pdns-pdns-gmysql-master[10622]: Received unsuccessful notification report for 'ifojedaijee.example' from 192.168.1.1:53, rcode: 5 Apr 24 14:26:00 pdns-debug pdns-pdns-gmysql-master[10622]: Received spurious notify answer for 'ifojedaijee.example' from 192.168.1.1:53 Apr 24 14:26:02 pdns-debug pdns-pdns-gmysql-master[10622]: Received spurious notify answer for 'ifojedaijee.example' from 192.168.1.2:53 Apr 24 14:26:22 pdns-debug pdns-pdns-gmysql-slave[10610]: Communicator thread died because of error: Database error trying to insert new slave 'ifojedaijee.example': Failed to execute mysql_query, perhaps connection died? Err=1: Duplicate entry 'ifojedaijee.example' for key 'name_index' root@pdns-debug:/home/mhaber#

in this (lab) case, Master and Slave are running on the same box (but on different IP addresses and on different databases, albeit in the same MySQL instance).

The Master continues to run just fine, so I suggest that there is no duplicate entry. Maybe the slave struggles about being notified three times in a row?

Please also notice that the MySQL error happens like ~ 30 seconds after the domain was touched by the slave for the last time, and 20 seconds after the domain was touched for the last time at all.

Habbie commented 11 years ago

Attachment '' (pdns-syslog) https://gist.github.com/5466738

Habbie commented 11 years ago

Attachment '' (zone) https://gist.github.com/5466739

Habbie commented 11 years ago

Attachment 'supermaster anti-race workaround' (supermaster-race.patch) https://gist.github.com/5466740

Habbie commented 11 years ago

Author: anon Ouch. I'll attach the log as a file.

Habbie commented 11 years ago

Author: peter supermaster-race.patch should limit the impact of this issue. It is not a real fix, though.

Habbie commented 11 years ago

Author: ahu This patch has been applied, and effectively solves the issue.

Habbie commented 11 years ago

Author: anon I am having the same issue with 3.1, so I do not believe this is fixed.

I have a server setup as a slave which receives updates from a supermaster. Updates to existing domains appear to work ok, however if a new zone is created on the supermaster, it crashes the slave when the slave commits the new zone, see below:

Jan 19 16:31:08 ns2 pdns[24478]: AXFR started for 'example.net', transaction started Jan 19 16:31:08 ns2 pdns[24478]: Exiting because communicator thread died with error: Database error trying to insert new slave 'example.net': Failed to execute mysql_query, perhaps connection died? Err=1: Duplicate entry 'example.net' for key 'name_index' Jan 19 16:31:08 ns2 pdns[24478]: AXFR done for 'example.net', zone committed with serial number 2013011902

pdns on the slave is dead at that point and needs to be started again.

upon looking at the mysql database, I can see that the domains and records tables have been populated with the correct data, so it does indeed seem like a race of sorts.

Any ideas on how to fix the issue?

Habbie commented 11 years ago

Author: peter Hello,

the fix is in 3.2, not in 3.1. Can you try with 3.1?

Habbie commented 11 years ago

Author: peter Uh. I meant - can you try with 3.2? :)

Habbie commented 11 years ago

Author: anon Hi Peter,

I've tried 3.2-rc4 (from freebsd ports). The same error is there but at least the server doesnt crash. The conversation looks a bit messy! There is a subsequent axfr because the version of the SOA on the slave is 0??


Jan 20 16:08:45 ns2 pdns[51746]: Done launching threads, ready to distribute questions Jan 20 16:15:39 ns2 pdns[51746]: Received NOTIFY for tittybumbum.com from 65.19.176.90 for which we are not authoritative Jan 20 16:15:39 ns2 pdns[51746]: Received NOTIFY for tittybumbum.com from 2001:470:1:203::1002 for which we are not authoritative Jan 20 16:15:40 ns2 pdns[51746]: Created new slave zone 'tittybumbum.com' from supermaster 65.19.176.90, queued axfr Jan 20 16:15:40 ns2 pdns[51746]: Initiating transfer of 'tittybumbum.com' from remote '65.19.176.90' Jan 20 16:15:40 ns2 pdns[51746]: Database error trying to create tittybumbum.com for potential supermaster 2001:470:1:203::1002: Database error trying to insert new slave 'tittybumbum.com': Failed to execute mysql_query, perhaps connection died? Err=1: Duplicate entry 'tittybumbum.com' for key 'name_index' Jan 20 16:15:40 ns2 pdns[51746]: 1 slave domain needs checking, 0 queued for AXFR Jan 20 16:15:40 ns2 pdns[51746]: AXFR started for 'tittybumbum.com', transaction started Jan 20 16:15:40 ns2 pdns[51746]: Received serial number updates for 1 zones, had 0 timeouts Jan 20 16:15:40 ns2 pdns[51746]: Domain 'tittybumbum.com' is stale, master serial 2013012002, our serial 0 Jan 20 16:15:40 ns2 pdns[51746]: Initiating transfer of 'tittybumbum.com' from remote '65.19.176.90' Jan 20 16:15:40 ns2 pdns[51746]: 1 slave domain needs checking, 0 queued for AXFR Jan 20 16:15:40 ns2 pdns[51746]: Received serial number updates for 1 zones, had 0 timeouts Jan 20 16:15:40 ns2 pdns[51746]: Domain 'tittybumbum.com' is stale, master serial 2013012002, our serial 0 Jan 20 16:15:40 ns2 pdns[51746]: AXFR started for 'tittybumbum.com', transaction started Jan 20 16:15:40 ns2 pdns[51746]: AXFR done for 'tittybumbum.com', zone committed with serial number 2013012002 Jan 20 16:15:40 ns2 pdns[51746]: Initiating transfer of 'tittybumbum.com' from remote '65.19.176.90' Jan 20 16:15:40 ns2 pdns[51746]: AXFR started for 'tittybumbum.com', transaction started Jan 20 16:15:40 ns2 pdns[51746]: AXFR done for 'tittybumbum.com', zone committed with serial number 2013012002 Jan 20 16:15:40 ns2 pdns[51746]: AXFR done for 'tittybumbum.com', zone committed with serial number 2013012002 Jan 20 16:15:42 ns2 pdns[51746]: Received NOTIFY for tittybumbum.com from 2001:470:1:203::1002 which is not a master Jan 20 16:15:56 ns2 last message repeated 2 times

Habbie commented 11 years ago

Author: anon both supermaster and slave are running the same version of powerdns.

Habbie commented 11 years ago

Author: anon there seems to be some ipv4 and ipv6 weirdness in that log...

ie:

a) Received NOTIFY for tittybumbum.com from 2001:470:1:203::1002 for which we are not authoritative

b) Created new slave zone 'tittybumbum.com' from supermaster 65.19.176.90, queued axfr Jan 20 16:15:40

Received notify via ipv6, axfr done via ipv4..

c) Database error trying to create tittybumbum.com for potential supermaster 2001:470:1:203::1002:

There seems to be a bit of flip flopping going on between v4 and v6.

Habbie commented 11 years ago

Author: peter There is no flipflopping - it receives both notifies but can only act on one. The initial v4 notify manages to trigger the supermaster entry's creation, which means the v6 will be ignored for that domain from now on.

The 0 serial means "domain not present yet", this is normal before the first AXFR.

And finally, 'not crashing' is what the fix for this issue promised. Avoiding all race conditions in this area is hard; not crashing is what we currently have to offer for this.

Thank you for checking! Closing ticket.

Habbie commented 11 years ago

Author: anon I think perhaps using MySQL replication might be a better option. Thanks.

Habbie commented 11 years ago

Author: peter Probably cleanest for now. Ticket #506 is tracking the general messiness of dual-stack slaving, by the way.