Open jhouse-solvd opened 1 year ago
In speaking with @ph-One we should prioritize "top-tier" monitors to know which are the most important. Monitors, rulesets, alerts, metrics, etc should be listed in sequential order of importance. Then, issues can be created to tackle one by one.
Is there internal documentation that needs to be created or updated?
Is there internal documentation that needs to be created or updated?
updated definition of done
@ph-One and I were thinking that we may need to re-examine PagerDuty teams and services in light of the recent team restructuring.
It might be better to scope this work to focus on Alertmanager rules related to critical monitors, ie "devops-critical" and "vsp-engineers-critical" (added to background context above)
How this initiative is broken down:
@mchelen-gov - Can you add this initiative to the DE product board? It is currently in progress.
Problem Statement
Monitors and alerts are spread across multiple systems. This negatively impacts the platform's ability to respond to incidents and support issues. And this leads to a confusing monitoring experience for platform operators. Additionally, maintaining multiple monitoring systems increases administrative overhead.
Background/context
How might we...
Hypothesis or Bet
This initiative should...
We will know we're done when... ("Definition of Done")
Known Blockers/Dependencies
List any blockers or dependencies for this work to be completed
Projected Launch Date
December 31, 2022
Launch Checklist
Is this service / tool / feature...
... tested?
... documented?
... measurable
When you're ready to launch...
Required Artifacts
Documentation
PRODUCT_NAME
: directory name used for your product documentationTesting
Measurement
TODOs