cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Create a list of noisy errors/alarms in #notification-staging-ops #23

Closed jimleroyer closed 2 days ago

jimleroyer commented 1 year ago

Description

As a GCNotify developer, I want to know if my work will cause problem in the staging environment, So that I can resolve ahead of time, But because there is so much alarms noise in staging, it's difficult to tell.

WHY are we building?

We need to discern real issues from noise when building our features.

WHAT are we building?

Create a list of noisy alarms and errors in the notification-staging-ops channel so that we can create future cards to address these issues.

VALUE created by our solution

Solve issues before they hit production, increase team velocity.

Acceptance Criteria

QA Steps

sastels commented 1 year ago

A few over the past week:

a lot of New Relic Error anomaly detection - Lambda API (High) logs-1-error-1-minute-warning-lambda-api Error percentage - admin (High)

Warnings going from OK to INSUFFICIENT_DATA: ses-complaint-rate-warning ses-complaint-rate-critical ses-bounce-rate-warning ses-bounce-rate-critical

logs-10-celery-error-1-minute-critical (especially Friday June 2 - were we testing that big template or something?)

ogs-1-500-error-1-minute-warning

sastels commented 1 year ago

Past 2 weeks of AWS alarms: https://docs.google.com/spreadsheets/d/1bQH8p_hSh89vqGfC-_gJsxZlmbzjGEc4Em-ooeWc0r0/edit#gid=0