Closed ben851 closed 1 year ago
Our Celery logs data was missing / inconclusive during investigation. A new card was created to monitor and continue the investigation so that we can properly tune these alarms.
This one can be closed as the remaining issue is now in the new card linked in previous comment by @whabanks .
Description
As a developer/operator of GC Notify, I would like to only be alerted when there are actual issues with our system, and not during false alarms so that I do not get alert fatigue and am able to quickly identify real errors.
This card covers the following alerts in the alarm review spreadsheet
WHY are we building?
We are receiving a lot of noise in our operations slack channel that are not indicative of actual issues.
WHAT are we building?
Investigate the Priority SQS queue and determine if they can be fixed or if the alarm needs adjustment
VALUE created by our solution
Fewer false alarms will increase developer agility and response times to actual issues.
Acceptance Criteria
QA Steps