As a developer/operator of GC Notify, I would like to only be alerted when there are actual issues with our system, and not during false alarms so that I do not get alert fatigue and am able to quickly identify real errors.
Most alarms are related to fluentd log switch, hence these were false positives in the past month. Also, some were related to a known incidents. It seems there are no adjustments necessary hence on our side.
Description
As a developer/operator of GC Notify, I would like to only be alerted when there are actual issues with our system, and not during false alarms so that I do not get alert fatigue and am able to quickly identify real errors.
This card covers the following alerts in the alarm review spreadsheet
WHY are we building?
We are receiving a lot of noise in our operations slack channel that are not indicative of actual issues.
WHAT are we building?
Investigate the celery 500 errors and determine if they can be fixed or if the alarm needs adjustment
VALUE created by our solution
Fewer false alarms will increase developer agility and response times to actual issues.
Acceptance Criteria
QA Steps