Open jimleroyer opened 7 months ago
For send-email-high queue visibility timeout: https://github.com/cds-snc/notification-terraform/pull/1029
Need to run the import on this, which will be done today.
In production. Ready for QA
Jimmy will look at high priority email queue delays for the past week.
@jimleroyer will check this after stand up.
Today I will, I promise
Using Blazer query [SLO] / [Sending time] High priority send times and for the production environment.
BEFORE: From Nov 14 to Nov 21:
label notification_count
<1s 37,205
<2s 5,980
<3s 1,310
<4s 342
<5s 129
<6s 58
<7s 22
<8s 20
<9s 21
<10s 14
<11s 15
<12s 19
<13s 18
<14s 17
<15s 10
<16s 3
<17s 69
<18s 66
<19s 22
<20s 15
<21s 8
<22s 8
<23s 7
<24s 9
<25s 6
<26s 3
<27s 68
<28s 9
<29s 1
<30s 2
<31s 2
<32s 2
<33s 3
<34s 1
<36s 1
<37s 1
<39s 1
<46s 1
<57s 1
<61s 1
<62s 1
<63s 2
<67s 2
<68s 1
<69s 1
<77s 1
<78s 1
<82s 1
<100s 1
<111s 1
<127s 1
<146s 1
<153s 1
<159s 1
<161s 1
<176s 1
<186s 1
<231s 1
<236s 1
<237s 2
<249s 1
<257s 1
<266s 1
<311s 1
<320s 1
<336s 1
<350s 1
<356s 1
<374s 1
<381s 1
<394s 1
AFTER: From Nov 21 to Nov 27:
label notification_count
<1s 31,666
<2s 5,172
<3s 988
<4s 197
<5s 56
<6s 24
<7s 16
<8s 5
<9s 7
<10s 8
<11s 12
<12s 11
<13s 3
<14s 3
<15s 1
<16s 4
<17s 52
<18s 31
<19s 11
<20s 2
<21s 3
<27s 24
<28s 3
<29s 1
<30s 1
<33s 1
<34s 1
<35s 1
We got 2 anomalies though in the past 2 days. Given issues we had, I discarded these from previous AFTER report. Even then, the number is way less than before, however we should check in a while if we completely eliminated high priority notifications from taking more than 5 minutes.
label notification_count
<302s 1
<16078s 1
Description
As a user of GCNotify, I want high priority email notifications to be sent within 1 minute, So that I can rely on the product and send my messages quick enough to users.
As an ops lead of GCNotify, I want high priority email notifications to be sent within 1 minute, So the alarms does not trigger as a SLO violation.
WHY are we building?
WHAT are we building?
Reduce the retry period of the high priority email notifications, because the retry currently kicks off at 5 minutes after the initial try, which already got past the SLO 99% of 1 minute (20 seconds at 90%).
VALUE created by our solution
Acceptance Criteria
QA Steps
Additional information
There are two areas to make the potential changes: