cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Tweak bounce rate alarms #132

Closed whabanks closed 1 year ago

whabanks commented 1 year ago

Currently there are four alarms in staging that are related to tracking bounce rate. While bounce rate is an important metric to track frequently in production, in staging we may be able to reduce noise by increasing thresholds.

Alarm Description Suggestion
logs-1-critical-bounce-rate-1-minute-warning One service exceeding 10% bounce rate in 1 minute Increase to 2 minutes
logs-1-warning-bounce-rate-1-minute-warning One service exceeding 5% bounce rate in 1 minute Increase to 2 minutes
ses-bounce-rate-critical Bounce rate >=7% over the last 12 hours Increase to 24 hours, many triggers were due to insufficient data
ses-bounce-rate-warning Bounce rate >=5% over the last 12 hours Increase to 24 hours, many triggers were due to insufficient data

Related doc: https://docs.google.com/spreadsheets/d/1KSyWaDdy4bIhiXc_Oqhi6QBUHhUIzyqqfpHR6Vu2WMU/edit#gid=0

jimleroyer commented 1 year ago

Do you know @whabanks what are the services in staging that would trigger these bounce rates? I wouldn't expect it.

whabanks commented 1 year ago

@jimleroyer I agree it's unlikely that individual services in staging would trigger the first two alarms. With that being the case I am wondering how useful ses-bounce-rate-critical and ses-bounce-rate-warning are in staging. Most of what I see currently are repeated INSUFFICIENT_DATA state changes in #notification-staging-ops which feels like noise.

whabanks commented 1 year ago

Upon further reflection it would not make sense to change the above alarms specifically for staging.

  1. It would create a discrepancy between our staging and prod environments.
  2. These are quite new alarms, they have not lived long enough for us to see the added value.
  3. They have not created that much noise and are easily discernible from real issues