BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

Add monitoring for Kyverno #4027

Closed StevenBarre closed 1 year ago

StevenBarre commented 1 year ago

Describe the issue I discovered the Kyverno pods in Silver were crashlooping, but had not been notified of such.

What is the Value/Impact? Awareness of when this critical service is not healthy.

What is the plan? How will this get completed? Can we move this to openshift-bcgov-kyverno and get the free alertmanager rules? Should we set up some other kind of monitoring?

Identify any dependencies Collab between Jason and AdvSol Ops

Definition of done Kyverno is properly monitored in all clusters its deployed in

w8896699 commented 1 year ago

pr on cerberus: https://github.com/bcgov/platform-services-sre/pull/21 ccm pr: https://github.com/bcgov-c/platform-gitops-gen/pull/717 sysdig dashboard: can be found here

CPU and memory alerts will send notification to rc: https://app.sysdigcloud.com/#/alerts/rules?filter=kyverno&direction=desc&sortBy=modifiedOn