Open peterthomassen opened 3 years ago
We don't need to worry about this with pdns 4.5: https://github.com/PowerDNS/pdns/pull/10196
We thus should not spend time developing a permanent fix. If it resurfaces before pdns 4.5, we can just rerun the above queries.
Although pdns 4.5 has AXFR priority levels, the problem still resurfaces when last_check
clusters around similar values on nsmaster. As a result, replication is slowed down especially to remote POPs, and update delays occur that are large enough that alerts are triggered by monitoring.
Recovery automatically happens when replication catches up everywhere eventually (around 30-45 in North America, up to 75 minutes in Asia und South America, and up to 90 minutes in Oceania). Data from today and last week.
This is confirmed by running the above SQL for checking the hour of last_check
(modified for Postgres due to a2c259d835c133755e2f10af5ea4b88092ca71e8):
SELECT count(*) AS lc_count, ceil(lc_mod/3600)::integer % 24 AS lc_ceil FROM (SELECT *, last_check AS lc_mod FROM domains) AS b GROUP BY lc_ceil ORDER BY lc_ceil;
Will have to think some more how to address this permanently. (Perhaps running the stretching SQL weekly, but something less hacky would be great.)
@nils-wisiol
For the record, the Postgres statement corresponding to the MySQL UPDATE
statement above is:
UPDATE domains SET last_check = CEIL(extract(epoch from now()) - random() * 86400) WHERE last_check IS NOT NULL;
NOTE: This update causes all freshness checks to be uniformly scheduled within the next 24hrs. As a result, some checks will happen "tomorrow" (close to "24hrs from now"), even when signature rollovers are due "today". As a result, publicly visible signatures will only be valid 6 days in the future (instead of the usual 7 or more), which may irritate our monitoring.
Query for stretching:
Check: