cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Alert warning when an external service (API callback) gets past a predefined threshold #356

Open jimleroyer opened 1 month ago

jimleroyer commented 1 month ago

Description

As an ops lead, I need to know when an external service request is timing out (i.e. via API callback), So that I can take appropriate actions to prevent a slow down of the system (such as removing the culprit callback).

WHY are we building?

The API callbacks has the chance of slowing down our overall notifications processing pipelines for Celery tasks that are neighbors to the one for doing the call backs (such as database saving, vital to the health of our overall pipeline). Hence we need to identify when such as scenario occurs so that we can take appropriate actions such as removing the configured callback that might slow down the pipeline).

WHAT are we building?

An warning alarm that reports back to the #notification-ops Slack channel.

VALUE created by our solution

Better awareness of potential faulty API callbacks

Acceptance Criteria

QA Steps

jimleroyer commented 1 month ago

The NewRelic alarm is over there 👈

sastels commented 1 month ago

We haven't had a warning triggered yet (ie after lowering the threshold to test) :/

jimleroyer commented 1 month ago

The alarm was seen after threshold was lowered. I also modified the query so that the alarm actually reports the URL that is timing out. After change, the alarm was tested with the low threshold. And we tested again successfully. I changed to a higher threshold and we haven't seen the alarm again (expectedly).

sastels commented 1 month ago

:+1: looks great!

P0NDER0SA commented 1 month ago

refviewed and QA'ed!! :)