Closed jimleroyer closed 2 days ago
A few over the past week:
a lot of New Relic Error anomaly detection - Lambda API (High) logs-1-error-1-minute-warning-lambda-api Error percentage - admin (High)
Warnings going from OK to INSUFFICIENT_DATA: ses-complaint-rate-warning ses-complaint-rate-critical ses-bounce-rate-warning ses-bounce-rate-critical
logs-10-celery-error-1-minute-critical (especially Friday June 2 - were we testing that big template or something?)
ogs-1-500-error-1-minute-warning
Past 2 weeks of AWS alarms: https://docs.google.com/spreadsheets/d/1bQH8p_hSh89vqGfC-_gJsxZlmbzjGEc4Em-ooeWc0r0/edit#gid=0
Description
As a GCNotify developer, I want to know if my work will cause problem in the staging environment, So that I can resolve ahead of time, But because there is so much alarms noise in staging, it's difficult to tell.
WHY are we building?
We need to discern real issues from noise when building our features.
WHAT are we building?
Create a list of noisy alarms and errors in the notification-staging-ops channel so that we can create future cards to address these issues.
VALUE created by our solution
Solve issues before they hit production, increase team velocity.
Acceptance Criteria
QA Steps