PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.67k stars 906 forks source link

batch deletion of records in a zone during AXFR to create smaller sql requests which do not reach gmysql-timeout #11837

Open martinbarlow opened 2 years ago

martinbarlow commented 2 years ago

Authoritative server as secondary handle large zones

If a zone is massive (millions of records) it can take several minutes to purge a zone as part of an AXFR. This can exceed the default gmysql-timeout of 10 seconds.

It would be better to perform smaller batches of deletes, within a transaction.

Usecase

handle large zones on secondary servers, without having to increase gmysql-timeout to very large values. (5-10 minutes)

Description

Increasing gmysql-timeout to very large numbers does not feel optimal. It would be better to batch delete records in a zone in a transaction, instead of trying to do one big query which can take many minutes.

Habbie commented 2 years ago

I don't follow how doing more, smaller, queries would be more optimal than one big query. (But I haven't yet looked at what we do exactly). Can you explain why it would be better?

martinbarlow commented 2 years ago

habbie, thanks for your consideration.

Today when there is an AXFR for a large zone on secondary, it does

delete from records where domain_id=?:

for a very large zone (millions of records) , this can take many minutes and will take longer than gmysql-timeout

https://www.rathishkumar.in/2017/12/deleting-huge-number-of-records-in-mysql.html

instead do

begin; while more records in table: delete from records where domain_id=? limit 10000 commit;

increasing the gmysql-timeout to very large numbers does not seem ideal, as it likely causes problems in other places or mask real issues with database. gmysql-timeout is also a semi hidden internal timeout, that folks can hit arbitrarily without knowing the ability exists to hit it in normal operations.

Habbie commented 2 years ago

Ok, understood. The delete would actually take slightly longer, but cut up in chunks that do not hit the timeout. This makes sense to me. I will milestone this "auth-helpneeded", which means we consider the question valid, but do not plan to work on it ourselves any time soon - but we will consider patches if people PR them.