Closed saderagsdale closed 6 months ago
Need to update logging links, KPIs @saderagsdale @data-doge
Needs KPIs for NOD production validation. Devs will chat tomorrow.
Note: The changes in V2 will only affect the submit action, which we are already monitoring and sending alerts for in our NOD dashboard. So we are all good to go for this ticket, once launched.
@data-doge can you tag me when the KPIs we're monitoring for the release are listed and linked in the ticket?
@data-doge just waiting for updated KPIs for this one.
Sorry for the delay on this @saderagsdale - just updated KPIs in the PR description.
Need to update the analytics for this. @saderagsdale will reach out to platform
@data-doge to send another batch of UUIDs to Tim
I'll send that batch of UUIDs to Tim tomorrow - ran out of time today
Sade will add links from release plan draft.
Eugene and Sade will update with new monitor links (as needed).
Need to add S3 auditing for parity.
@saderagsdale Made just one update to ticket desc:
Traffic on NOD/SC evidence submission endpoint greater than or equal to ~150~ 100 uploads / day. Tracked via this DataDog alert.
Our monitor was too sensitive and was giving too many false positives, so I changed the evidence submission threshold to 100 / day.
Dialed back down to 0% due to an issue on the LH side, may dial back up today. Just tested a fix from LH so we'll see what they say
Back up to 25% as of yesterday afternoon, monitors look pretty good so far
Julie gave us the OK to increase to 50% 🎉
Increased to 50% 👍
At 100%, things are looking good
How to use this ticket
This is a daily checklist for monitoring the health of our release. Tasks for this ticket should be completed first thing at the beginning of each day, and reported to the internal team and key stakeholders. Below are the metrics used to gauge the health of the release. See these links for details on the release and rollback plan, and incident monitoring plan.
Step 1: Check the monitoring dashboards for release health
Dashboards
NOD Dashboard in DataDog Launch Monitoring (V1 v. V2). NOD Domo Dashboard
Are users abandoning the form disproportionately between V1 and V2?
Is the error rate showing any spikes?
Is the contestable issues endpoint working?
Is submission traffic proportionate to the typical submission rate on a given day?
Is evidence submission proportionate to the typical submission rate on a given day?
Do our evidence uploads match Lighthouse's count?
Alerts will be automatically posted in the
#benefits-decision-reviews-notifications
Slack channel, and sent to team members via email. Eugene and Sade will be responsible for triaging these alerts daily.Step 2: Document new bugs or spikes using this template and list them below.
Step 3: Update the enablement team and LH Banana Peels team
Step 4: Confirm when release period is officially closed