department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
283 stars 205 forks source link

NOD | V2 Launch Monitor #64642

Closed saderagsdale closed 6 months ago

saderagsdale commented 1 year ago

How to use this ticket

This is a daily checklist for monitoring the health of our release. Tasks for this ticket should be completed first thing at the beginning of each day, and reported to the internal team and key stakeholders. Below are the metrics used to gauge the health of the release. See these links for details on the release and rollback plan, and incident monitoring plan.

Step 1: Check the monitoring dashboards for release health

Dashboards

NOD Dashboard in DataDog Launch Monitoring (V1 v. V2). NOD Domo Dashboard

Are users abandoning the form disproportionately between V1 and V2?

Is the error rate showing any spikes?

Is the contestable issues endpoint working?

Is submission traffic proportionate to the typical submission rate on a given day?

Is evidence submission proportionate to the typical submission rate on a given day?

Do our evidence uploads match Lighthouse's count?

Alerts will be automatically posted in the #benefits-decision-reviews-notifications Slack channel, and sent to team members via email. Eugene and Sade will be responsible for triaging these alerts daily.

Step 2: Document new bugs or spikes using this template and list them below.

Step 3: Update the enablement team and LH Banana Peels team

Step 4: Confirm when release period is officially closed

saderagsdale commented 1 year ago
saderagsdale commented 1 year ago

Need to update logging links, KPIs @saderagsdale @data-doge

saderagsdale commented 1 year ago

Needs KPIs for NOD production validation. Devs will chat tomorrow.

data-doge commented 1 year ago

Note: The changes in V2 will only affect the submit action, which we are already monitoring and sending alerts for in our NOD dashboard. So we are all good to go for this ticket, once launched.

saderagsdale commented 1 year ago

@data-doge can you tag me when the KPIs we're monitoring for the release are listed and linked in the ticket?

saderagsdale commented 1 year ago

@data-doge just waiting for updated KPIs for this one.

data-doge commented 1 year ago

Sorry for the delay on this @saderagsdale - just updated KPIs in the PR description.

saderagsdale commented 1 year ago

Need to update the analytics for this. @saderagsdale will reach out to platform

HeatherWidmont commented 1 year ago

@data-doge to send another batch of UUIDs to Tim

data-doge commented 1 year ago

I'll send that batch of UUIDs to Tim tomorrow - ran out of time today

saderagsdale commented 9 months ago

Sade will add links from release plan draft.

saderagsdale commented 9 months ago

Eugene and Sade will update with new monitor links (as needed).

saderagsdale commented 9 months ago

Need to add S3 auditing for parity.

data-doge commented 8 months ago

@saderagsdale Made just one update to ticket desc:

Traffic on NOD/SC evidence submission endpoint greater than or equal to ~150~ 100 uploads / day. Tracked via this DataDog alert.

Our monitor was too sensitive and was giving too many false positives, so I changed the evidence submission threshold to 100 / day.

HeatherWidmont commented 8 months ago

Dialed back down to 0% due to an issue on the LH side, may dial back up today. Just tested a fix from LH so we'll see what they say

HeatherWidmont commented 8 months ago

Back up to 25% as of yesterday afternoon, monitors look pretty good so far

anniebtran commented 8 months ago

Julie gave us the OK to increase to 50% 🎉

anniebtran commented 8 months ago

Increased to 50% 👍

HeatherWidmont commented 7 months ago

At 100%, things are looking good