Closed amberrignell closed 1 year ago
To facilitate testing in dev mode , the time scale is set to 1mn (vs 24h), so you can run failing jobs and check the /dev/mailbox to look at emails @amberrignell
For dev purpose, failure notification are activated by default. What is need here? @amberrignell
By default amnesia should store to disk, we will deal with remaining questions around persistence after this has been merged.
User story
As a user that has a project with critical workflows in it, I want to be notified every time there is a run failure, so that I can see what the issue is and decide whether to address it or not.
As a user that has enabled realtime alerts, if a given workflow fails more that 5 times in 24 hours, I want to stop receiving emails so that my inbox does not get cluttered.
Details
See excel doc below for example: https://docs.google.com/spreadsheets/d/13eOkUKDbK_jHDp58f70v3dGQwmtF9V4j0_RmoliG6YA/edit?usp=sharing
The email should be as follows:
Subject: {n. failures)th failure for workflow {workflow_name}
Body: Hi ${user first name}, Word order ${worder_order_id} failed for workflow ${workflow_name} with the following logs:
Please view it [here] ({link_to_run}) to debug the issue.
*This is the ${n. failures} failure in the last 24 hours. If the workflow has more than 5 failed runs in the last 24 hours, you will stop receiving these alerts.
Implementation notes
A quick proposal for the solution
Release notes
User acceptance criteria
Given a user that has enabled realtime alerts for a project (failure_alerts set to true),
[ ] (a) when a run in that project fails, they should receive an email notification. That email notification should include the time of the run, the workflow name, work order ID and a link to the failed run.
[ ] (b) when 6 runs from the same workflow fail within less than 24hours, they should receive only 5 emails
[ ] (c) when 6 runs from the same workflow and one run from a different workflow fail within 24hours, the user should receive 6 emails (5 about the first workflow, 1 about the second).
[ ] (d) given a run failure, if that workflow has 1 failed run that happened 25 hours ago and 4 failed runs that happened less than 24 hours ago, the user should be notified by email.
[ ] Given a user that has NOT enabled realtime alerts for a project, when a run in that project fails, they should NOT receive an email notification.