CDCgov / trusted-intermediary

Bringing together healthcare providers by reducing the connection burden.
Apache License 2.0
11 stars 5 forks source link

Azure Alerts for Errors - Azure is down #1397

Open scleary1cs opened 2 weeks ago

scleary1cs commented 2 weeks ago

Story

As a developer, I need to know if Azure is down, so that we can begin an incident.

Acceptance Criteria

Tasks

Definition of Done

Notes

pluckyswan commented 6 days ago

Created a Service Health > Health alerts in internal, currently disabled.

jherrflexion commented 4 days ago

Curious if this would be too noisy if it is looking at every Azure service? Will attempt to test this and convert to Terraform today.

halprin commented 4 days ago

Curious if this would be too noisy if it is looking at every Azure service? Will attempt to test this and convert to Terraform today.

Yeah, we definitely don't want to be looking at services we don't use. And yes, we want all these alert stories done via Terraform.

jherrflexion commented 4 days ago

azure-outage-alert branch created. Internal clickops enabled for testing.

jherrflexion commented 4 days ago

Currently blocked by an error in Terraform "CreateOrUpdate" on the test PR. Attempted to rerun the job a few times and draft another new PR and still received the error.

pluckyswan commented 1 day ago

PR is out.

jherrflexion commented 9 hours ago

Addressing PR comments

halprin commented 52 minutes ago

Had to revert this work because sadly this is failing deploys in staging. You can see the revert PR for the thinking behind this. Because of all of this, moved this story back into In Progress.