cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Investigate Bounce Rate Alarms #124

Open ben851 opened 1 year ago

ben851 commented 1 year ago

Description

As a developer/operator of GC Notify, I would like to only be alerted when there are actual issues with our system, and not during false alarms so that I do not get alert fatigue and am able to quickly identify real errors.

This card covers the following alerts in the alarm review spreadsheet

WHY are we building?

We are receiving a lot of noise in our operations slack channel that are not indicative of actual issues.

WHAT are we building?

Investigate the bounce rate errors and determine if they can be fixed or if the alarm needs adjustment

VALUE created by our solution

Fewer false alarms will increase developer agility and response times to actual issues.

Acceptance Criteria

QA Steps

andrewleith commented 1 year ago
whabanks commented 1 year ago

ses-bounce-rate-warning and ses-complaint-rate-critical are often firing with INSUFFICIENT_DATA in staging. We should discuss tuning how these alarms treat missing or insufficient data.

jimleroyer commented 1 year ago

We need to investigate a bit more on the services that trigger these alarms. Staging normally shouldn't trigger these. Let's identify these so we know what could be the cause.

jimleroyer commented 1 year ago

@whabanks and I looked into it and came up with this PR: https://github.com/cds-snc/notification-terraform/pull/815

ben851 commented 1 year ago