Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2k stars 574 forks source link

Master as Fallback for Satellite Zone #8491

Open ErwinE opened 3 years ago

ErwinE commented 3 years ago

Is your feature request related to a problem? Please describe.

Right now there is no option to set the master system as a fallback system for the satellite zones. We have one master system running on hardware and 7 virtualized satellites around the globe. If there is no power at the satellite location or the Virtualization Platform has problems, there is no monitoring available at the location. In big outages we have no clue what´s wrong at the location. Maybe only the satellite is not running, but it can be everything. All checks at the satellite location stay on "OK" and "Pending" and no one get notifications except me, the monitoring admin, even all systems there are down.

Describe the solution you'd like

A option, that allows you to set the master system as a fallback for all satellite zones. It also can be a option for each zone. If the satellite isn´t working, all checks will be executed by the master.

Additional context

This FR was created after a discussion on the Icinga community site, where more information about thoughts and workarounds can be found: https://community.icinga.com/t/master-as-fallback-for-satellite-zone/6014

julianbrost commented 3 years ago

Can you describe in more detail how you imagine this feature to work? So I guess you have agents and want these to connect directly to the master if the can't reach their satellite? Do you also run checks locally on the satellite (e.g. ping agents from there)?

But unfortunately I don't see this happening any time soon as this sound like it needs high-availability features between nodes in different zones and dynamic reconfiguration of the zone hierarchy. The current architecture isn't designed for either.

ErwinE commented 3 years ago

We want all checks to work even if the satellite isn´t working. This includes the agents, that will connect to the master if the satellite isn´t reachable and the checks that are executed by the satellite locally (icmp,ping,snmp,http....)

Zones

NoobyNoob1983 commented 3 years ago

Thats an great Idea, we need this behaviour too.