department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 205 forks source link

[Datadog] Set up monitors and alerts for CI/CD components and services #47543

Open jhouse-solvd opened 2 years ago

jhouse-solvd commented 2 years ago

Product Outline

Monitoring using DataDog

High-Level User Story/ies

As a VA.Gov Platform Operator, I need monitoring for CI/CD components and services to help maintain the availability of platform services (and their dependencies).

Hypothesis or Bet

Thoughts: Platform operators should be able to

Note: Can we tighten these up and measure specific outcomes?

Tech notes

Definition of done

What must be true in order for you to consider this epic complete?

Dashboards exist for the following:

Note: Multiple services can be displayed on the same dashboard, ie there does not need to be a dashboard for every service, but that service should be represented on a dashboard.

Monitors exist for the following:

Alerts exist for the following:

Note: Alerts should be routed to the appropriate channel/personnel based on the alert's severity.

mchelen-gov commented 1 year ago

Captured as Postmortem action item for https://github.com/department-of-veterans-affairs/va.gov-team-sensitive/pull/695, context is that: