Don't broadcast Checkable#next_check updates made just not to check twice

Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.

https://icinga.com/docs/icinga2/latest

GNU General Public License v2.0

2k stars 574 forks source link

Don't broadcast Checkable#next_check updates made just not to check twice #10093

Open Al2Klimov opened 3 months ago

Al2Klimov commented 3 months ago

The checker sorts Checkables by next_check while picking the next due one, so we (already) have to advance next_check while starting a check. But the second master doesn't need this info, as it's not responsible.

refs #10082

TODO

[x] #10011

Al2Klimov commented 3 months ago

@yhabteab Are you sure all SetNextCheck remaining after #10011 (which don't already call Checkable::OnNextCheckUpdated) shall be written to DB? Especially in Checkable#Start() or those affecting parent/child..

yhabteab commented 3 months ago

No, I'm not! I haven't even checked all of them in detail, but aren't they just like the other superfluous ones from Checkable::ExecuteCheck() and used to enforce the scheduler to re-index its idle checkables queue?

Al2Klimov commented 3 months ago

I don't think so.

https://github.com/Icinga/icinga2/blob/bca1a8447af26313123183efa863f947f9590731/lib/icinga/checkable-check.cpp#L392-L404

https://github.com/Icinga/icinga2/blob/bca1a8447af26313123183efa863f947f9590731/lib/icinga/checkable-check.cpp#L406-L420

Especially these two re-schedule the next check of other checkables as something happened with the current one. The other HA node may be responsible for them, so of course this matters for the cluster. But IMAO this doesn't matter for our backends. I think we can stop our anti-SetNextCheck witch-hunt here.

yhabteab commented 1 week ago

The other HA node may be responsible for them, so of course this matters for the cluster.

We literally trigger an OnNewCheckResult event just two lines below the highlighted code, which is synced and goes into the same code path on the other node.

Al2Klimov commented 6 days ago

In addition, CheckerComponent::NextCheckChangedHandler needs these two (not suppressed!) events too, so that these next_check updates are effective at all.

yhabteab commented 6 days ago

Honestly, I don't understand why you're turning down all the suggestions, just to stick to the first idea that came to your mind. Let's be real, no one said this should work out right away with either suggestions. However, compared to adding yet another useless cluster event, this could be a much better solution. We already have enough problems with the countless/unpredictable RPC messages to deal with. So, why add yet another one when there's a better alternative?

I'm not saying these SetNextCheck calls are useless, but you should first minimise the places where this method is called. After that, we can discuss how these remaining 2-3 calls should be handled. If you are only concerned about these two calls, then two additional Icinga DB events are nothing to me if we can reduce the load on the cluster as a result.

Al2Klimov commented 6 days ago

Actually I'm totally fine with both (https://github.com/Icinga/icinga2/issues/10082#issuecomment-2298640579) a new event or a flag in the existing one. Also, I just said (https://github.com/Icinga/icinga2/pull/10093#issuecomment-2377330210) these events are needed locally. For the latter we could add flag(s) to setters and event handlers, so that the latter can say: Oh, this event is not for broadcasting, so I won't send it as cluster message. Ok?

julianbrost commented 2 days ago

refs #10082

How does this relate to that issue now? If this PR was merged as-is, how would you address that bug?