cds-snc / notification-planning

Project planning for GC Notify Team
5 stars 0 forks source link

Circuit breaker on the callback URL when it fails too much #1536

Closed sastels closed 4 weeks ago

sastels commented 5 months ago

Description

As a Notify team member, I need Notify to be stable and reliable.

WHY are we building? Callback failures can cause system slowdowns / instability.

WHAT are we building? Stop sending to a callback that has been generating errors.

VALUE created by our solution System more stable and reliable

Acceptance Criteria

  1. TF warning alert - send warning email
  2. TF critical alert - send suspension email
  3. Ops lead suspend service manually

QA Steps

Related Incident 2024-03-28-notify-is-spitting-out-a-bunch-of-errors

whabanks commented 3 months ago

As part of the work on 1564 I sketched out what this might look in implementation.

image.png
jzbahrai commented 4 weeks ago

Based on the ADR for callbacks - we are not automating most of the callback flows. I am going to shut this ticket as things that are relevant in it are covered by other tickets, things that are not - we will pick up in version 2 of callbacks if needed.