Feature request: Ability to disable pro-active checks against master on startup

nearmiss commented 7 years ago

Program: Authoritative
Issue type: Feature request

Short description

I run a slave using a bind instance as a supermaster, receiving updates via NOTIFY. On startup of my pdns slave, it appears to attempt to check the freshness of each of my zones, (millions of them), causing a massive queue of SOA's, that never really seems to recover. I'd like to be able to disable my slave from pro-actively checking any zones, relying solely on my master to send NOTIFY's. I have checks and balances in place to re-trigger notify's from my master should the slave go offline, negating (I think?) the need to check on startup.

Usecase

I run a slave using a supermaster, which has millions of zones. I need to be able to start up my slave without it getting bogged down attempting to re-check all my zones against the supermaster.

Description

I'd like to be able to disable my slave from initiating any checks of the zone, other than responding to notifies, via some flag or config value.

cyclops1982 commented 6 years ago

I'm not sure if this is enough, but would it be an idea to allow 'slave-cycle-interval' to be set to 0 (zero) and disable the slave-cycle checks all together?

Habbie commented 6 years ago

Using 0 currently would currently give you super frequent checks, but fixing the code so that 0 means 'do nothing' makes sense to me.

cyclops1982 commented 6 years ago

Problem is that slave-cycle-interval is also used for master operation. So allowing 0 would also impact that functionality too. It sounds easy, but is not :-)

I think having a master-cycle-interval might make sense to control that, but the functionality in communicatorclass::mainloop would have to be split up. Maybe 2 threads? Maybe the implementation just needs to become smarter with the two values.

Or, we just add a 'disable-slave-cycle' and check on that.

aldem commented 1 year ago

I second that. While my setup does not have millions of zones, even ~200K are periodically producing a lot of unnecessary traffic + consumes significant amount of CPU time every *-cycle-interval seconds and even worse - on every incoming NOTIFY/AXFR (which forces freshness check on all secondary zones due to some reason).

For "pure" secondary operations, when we can guarantee that data in the backend is never modified by anything but PDNS itself, there is absolutely no need to periodically check for freshness nor to re-read all zones as we always know their current state which could be modified only by using API or NOTIFY.

An option to disable both zone refresh and SOA checks for all secondary zones would be really nice.

Habbie commented 1 year ago

An option to disable both zone refresh and SOA checks for all secondary zones would be really nice.

If you want to fully disable SOA checks (so not just at startup), just make sure your config does not have slave=yes or secondary=yes.

aldem commented 1 year ago

When slave=no notifications about new zones are refused and zones could not be provisioned automatically on NOTIFY.

Moreover, even explicit AXFRs are ignored (silently) for existing slave zones, thus the only way to provision or update zones is to use API (though not sure if this will work for slaves when slave operations are disabled) or to update backend data directly, which over-complicates and slows down things.

Habbie commented 1 year ago

Sorry, I misread. You did want NOTIFY handling.

aldem commented 1 year ago

Exactly. It works perfectly except for this high CPU usage and unnecessary traffic - change on master is propagated automatically, slaves are receiving NOTIFY and retrieve updates, periodical check ensures that recently modified zones are up-to-date (thus traffic is minimal) - and system load (including disk i/o) is minimal.

The main problem is that when plenty of zones are updated frequently - say, we have one update per 10 seconds, which is quite possible when some of them use dyndns, then CPU load will stay at 100% all the time as one "refresh" takes ca. 10s.

klaus-nicat commented 1 year ago

Are you using a DB backend? Why not just changing the DB query?

gpgsql-info-all-slaves-query=SELECT id,name,master,last_check from domains where 1=0;

We do not have issues with SOA checks (millions of zones). We limit the number of processed per slave cycle by adding a "limit 5000" to the info-all-slaves-query. Further we have some hacks to to check only once a day per zone, by forcing the "refresh" value to 24h.

But I do not understand why there is a problem at startup. AFAIK PowerDNS does not differentiate between startup and running since hours. So a "restart" will not trigger SOA-Checks to all zones, only to zones where last_check+refresh<NOW().

aldem commented 1 year ago

@klaus-nicat I have tried this already, but the problem is that any SQL backend is slower than LMDB, which I plan to use on all secondary servers (currently this is SQLite to avoid another dependency and save resources).

My problem is that periodically I face DDoS attacks, when every bit of performance counts, and due to the nature of these attacks (NXDOMAIN flood), SQL backends are so stressed that regular requests are not answered in time - many random queries are quickly killing even the large cache.

LMDB, on the other hand, gives me an additional 20-30% of performance, but there is still a periodic refresh of all secondary zones - on every zone update (which happens quite often) - this also takes its share, but cannot be disabled.

In any case, on secondary-only servers (no direct database updates) when zones are pushed by NOTIFY (or AXFRed by API), any full re-reading of the database makes little sense, as the server is (should be at least) always aware of the current zone status thus could refresh them individually (but it looks like this does not happen).

But I do not understand why there is a problem at startup

There is no problem at startup of course, I didn't say this.

PowerDNS / pdns