Open ben851 opened 1 year ago
Alarm causes have been documented here.
There are 4 alarms with causes that need to be investigated:
I've started a slack thread tagging relevant parties so we can decide what to do with these 4.
Only one issues remains related to redis maintenance period. Will track that issues separately, this is now complete.
We have an existing card for the Redis maintenance period issue: https://app.zenhub.com/workspaces/notify-planning-core-6411dfb7c95fb80014e0cab0/issues/gh/cds-snc/notification-planning-core/135
Let's move this card to done. Thanks @andrewleith
Description
As a developer/operator of GC Notify, I would like to only be alerted when there are actual issues with our system, and not during false alarms so that I do not get alert fatigue and am able to quickly identify real errors.
This card covers the following alerts in the alarm review spreadsheet
WHY are we building?
We are receiving a lot of noise in our operations slack channel that are not indicative of actual issues.
WHAT are we building?
Investigate the api lambda errors and determine if they can be fixed or if the alarm needs adjustment
VALUE created by our solution
Fewer false alarms will increase developer agility and response times to actual issues.
Acceptance Criteria
QA Steps