Open jimleroyer opened 1 month ago
The NewRelic alarm is over there 👈
We haven't had a warning triggered yet (ie after lowering the threshold to test) :/
The alarm was seen after threshold was lowered. I also modified the query so that the alarm actually reports the URL that is timing out. After change, the alarm was tested with the low threshold. And we tested again successfully. I changed to a higher threshold and we haven't seen the alarm again (expectedly).
:+1: looks great!
refviewed and QA'ed!! :)
Description
As an ops lead, I need to know when an external service request is timing out (i.e. via API callback), So that I can take appropriate actions to prevent a slow down of the system (such as removing the culprit callback).
WHY are we building?
The API callbacks has the chance of slowing down our overall notifications processing pipelines for Celery tasks that are neighbors to the one for doing the call backs (such as database saving, vital to the health of our overall pipeline). Hence we need to identify when such as scenario occurs so that we can take appropriate actions such as removing the configured callback that might slow down the pipeline).
WHAT are we building?
An warning alarm that reports back to the #notification-ops Slack channel.
VALUE created by our solution
Better awareness of potential faulty API callbacks
Acceptance Criteria
QA Steps