BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

Switch Nagios alerting to the secondary servers for LAB #5218

Open StevenBarre opened 3 weeks ago

StevenBarre commented 3 weeks ago

Describe the issue The main Nagios server is used by all the OS teams and is a little overloaded. From time to time this generates false Stale alerts for us. The Nagios team has asked us to move our alerts over to the secondary servers that are less loaded.

What is the Value/Impact? Less false alerts

What is the plan? How will this get completed?

Do this for each cluster, one at a time, starting in LAB

Identify any dependencies None

Definition of done CLAB, KLAB, KLAB2 switched over to secondary Nagios

vivekratan88 commented 2 weeks ago

made change in review now PR: https://github.com/bcgov-c/platform-tools/pull/213

vivekratan88 commented 1 week ago

worked on testing there seems to be some errors and its not picking up the changes will have to see what I can do further about this.

vivekratan88 commented 1 week ago

more testing done it seems have errors with API Key looking into it

vivekratan88 commented 2 days ago

Got new API key and the API key errors seem to be gone, the Nagios job work and so does monitoring working on the remaining jobs

vivekratan88 commented 1 day ago

The key was expired hence API key error, so we had to send ticket to Rowan to make settings from his end and the key works now. Found the steps and solved one monitor job with Steven and group then later did the remaining 4 jobs now doing DHCP check metric which seems to have shell script which is running in different namespace and working on troubleshooting that the build works with logs been positive and no issues but not able to get rid of the dhcp stale check. I am comparing how it is configured in SILVER now and see if it's any different.