Open ErwinE opened 3 years ago
Can you describe in more detail how you imagine this feature to work? So I guess you have agents and want these to connect directly to the master if the can't reach their satellite? Do you also run checks locally on the satellite (e.g. ping agents from there)?
But unfortunately I don't see this happening any time soon as this sound like it needs high-availability features between nodes in different zones and dynamic reconfiguration of the zone hierarchy. The current architecture isn't designed for either.
We want all checks to work even if the satellite isn´t working. This includes the agents, that will connect to the master if the satellite isn´t reachable and the checks that are executed by the satellite locally (icmp,ping,snmp,http....)
Thats an great Idea, we need this behaviour too.
Is your feature request related to a problem? Please describe.
Right now there is no option to set the master system as a fallback system for the satellite zones. We have one master system running on hardware and 7 virtualized satellites around the globe. If there is no power at the satellite location or the Virtualization Platform has problems, there is no monitoring available at the location. In big outages we have no clue what´s wrong at the location. Maybe only the satellite is not running, but it can be everything. All checks at the satellite location stay on "OK" and "Pending" and no one get notifications except me, the monitoring admin, even all systems there are down.
Describe the solution you'd like
A option, that allows you to set the master system as a fallback for all satellite zones. It also can be a option for each zone. If the satellite isn´t working, all checks will be executed by the master.
Additional context
This FR was created after a discussion on the Icinga community site, where more information about thoughts and workarounds can be found: https://community.icinga.com/t/master-as-fallback-for-satellite-zone/6014