Icinga / icingaweb2

A lightweight and extensible web interface to keep an eye on your environment. Analyse problems and act on them.
https://icinga.com/get-started/
GNU General Public License v2.0
806 stars 279 forks source link

Change or disable global status errors right after a director deployment #5259

Open slalomsk8er opened 1 week ago

slalomsk8er commented 1 week ago

Is your feature request related to a problem? Please describe.

Every director deployment results in messages like this: image Which make Icinga look bad in the eyes of the users.

Describe the solution you'd like

Make the code check if the director just deployed and based on this increase the timeout or change the message text and color.

Describe alternatives you've considered

Globally increase the timeout.

Additional context

Add any other context or screenshots about the feature request here.

nilmerg commented 1 week ago

Hey, how exactly is the deployment performed?

slalomsk8er commented 1 week ago

We deploy manually ATM.

nilmerg commented 1 week ago

And what does manually mean? Exactly? :wink:

slalomsk8er commented 1 week ago

Clicking on one of the "Ausrollen" links that are distributed all over the director. 😉

nilmerg commented 1 week ago

I cannot reproduce this (with a sleep(120) in my director config).

Any idea what could cause this? @yhabteab

yhabteab commented 6 days ago

I cannot reproduce this (with a sleep(120) in my director config).

As we discussed last time, a director deployment should never prevent Icinga DB from updating the icingadb_instance table, but looking at the Icinga DB web code, I see two reasons why this might happen:

nilmerg commented 6 days ago

If there are multiple rows in icingadb_instance, Icinga DB Web makes sure that the newest (heartbeat desc) is evaluated. So it shouldn't be affected by this :thinking:

yhabteab commented 6 days ago

If there are multiple rows in icingadb_instance

That is not the problem in the referenced issue! The problem is that the active Icinga DB instance inserts a outdated heartbeat instance into the icingadb_instance table while remaining HA responsible, and the passive instance reads this outdated heartbeat like Icinga DB Web does and thinks that the other instance is gone and has to take over HA responsibility, resulting in both instances becoming responsible.

nilmerg commented 5 days ago

thinks that the other instance is gone

and doesn't insert a row in icingadb_instance because of this?

If so, it may be the same reason. But why is this related to a director deployment? (@slalomsk8er wrote above the message (competition) is caused by this every time)