cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Investigate Lambda API Alerts #125

Open ben851 opened 1 year ago

ben851 commented 1 year ago

Description

As a developer/operator of GC Notify, I would like to only be alerted when there are actual issues with our system, and not during false alarms so that I do not get alert fatigue and am able to quickly identify real errors.

This card covers the following alerts in the alarm review spreadsheet

WHY are we building?

We are receiving a lot of noise in our operations slack channel that are not indicative of actual issues.

WHAT are we building?

Investigate the api lambda errors and determine if they can be fixed or if the alarm needs adjustment

VALUE created by our solution

Fewer false alarms will increase developer agility and response times to actual issues.

Acceptance Criteria

QA Steps

andrewleith commented 1 year ago

Alarm causes have been documented here.

There are 4 alarms with causes that need to be investigated:

I've started a slack thread tagging relevant parties so we can decide what to do with these 4.

andrewleith commented 1 year ago

Only one issues remains related to redis maintenance period. Will track that issues separately, this is now complete.

jimleroyer commented 1 year ago

We have an existing card for the Redis maintenance period issue: https://app.zenhub.com/workspaces/notify-planning-core-6411dfb7c95fb80014e0cab0/issues/gh/cds-snc/notification-planning-core/135

Let's move this card to done. Thanks @andrewleith