This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8
stars
17
forks
source link
Switch Internal Nagios monitoring for KLAB and KLAB2 to use Nagios XI-ACTIVE #5274
Describe the issue
The main Nagios server is used by all the OS teams and is a little overloaded. From time to time this generates false Stale alerts for us. The Nagios team has asked us to move our alerts over to the secondary servers that are less loaded.
Additional context
The DXC Monitoring team will need to be actively involved from this point forward to ensure that during these migrations, that historical data is being preserved.
How does this benefit the users of our platform?
Less false alerts for Platform Operations to deal with, leaving more time for them to work on other things.
Definition of done
[ ] Coordinate with DXC Monitoring with proposed workflow to refine a final workflow.
[ ] Test the new Workflow on KLAB.
[ ] Adjust workflow details if required from lessons-learned on KLAB.
Describe the issue The main Nagios server is used by all the OS teams and is a little overloaded. From time to time this generates false Stale alerts for us. The Nagios team has asked us to move our alerts over to the secondary servers that are less loaded.
Additional context The DXC Monitoring team will need to be actively involved from this point forward to ensure that during these migrations, that historical data is being preserved.
How does this benefit the users of our platform? Less false alerts for Platform Operations to deal with, leaving more time for them to work on other things.
Definition of done