department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
282 stars 203 forks source link

526 Logging and Error Reporting #60952

Closed SamStuckey closed 8 months ago

SamStuckey commented 1 year ago

Product Outline

High Level User Story

This work is the first of two steps to provide general clarity into our 526 form flow. This step will generate information. In the next step (out of scope for this epic) we will (better) organize that information into actionable data via Sentry Dashboards and DataDog(?)

Current:

We have identified a list of discrete submission actions inside our 526 form flow that represent potential "black holes" for lost information. This problem was first identified in regards to overall 526 form submissions being "lost". A "paper" submission failover solution has been added to reduce our number of failed submissions, but we still have many other discrete un-or-under reported possible points of failure in the 526 app flow.

Future:

We have clear action logging around discrete submission actions. The output of this work will enable us to create more meaningful dashboards.

In scope

Wrap KPIs in logging, send the information to our Log files (Rails.logger)

Out of scope

Organizing this information into actionable data in Sentry / DataDog dashboards.

Hypothesis

If we make these changes, the enhanced visibility into the "under the hood" actions of our 526 app we will create new insights and actionable data, which in turn will allow us to iterate on opportunities including bounce rate investigations, general debugging, application health monitoring, and improved failure notifications.

Definition of done

Each of the KPIs (Key Point of Interest) listed below have been investigated and (if required) reinforced with the appropriate action logging and error handling. Changes and findings are documented:

Documentation

Use the following document to track research into each KPI / ticket.

We want the following Metrics/Data for each KPI action:

Tips for workflow

The assigned dev should ask themselves;

the desired result of this information, once its organized in a dashboard is to be able to look at it and see at a glance, "is my 'thing' healthy?" If the answer is NO, we want every possible bit of data that could help investigate.

Historical Context

This Epic Spawned this work which resulted in these two tickets for hand off

  1. Implement Logging
  2. Refine Sentry Dashboards

It was determined that ticket #2 should / could be done in an iteration after we have a clear target for what we want our sentry dashboards to look like.
Ticket 1 lists several versions of that same dashboard work relative to Sentry and DataDog. That work was determined to be out of scope as documented here Research related to the hand off of the error logging work is documented here which in turn led to the creation of this document for planning the ongoing work

TL;DR

All of this (above) historical work, ticketing, and research is either encapsulated by the action items outlined in this epic, or slated for an iteration, e.g. Dashboard refinement.

KPIs

Each of these discrete actions is represented by a ticket in this epic. Each ticket will require at least investigation and documentation, as well as possibly (probably) code enhancements to add appropriate logging.

SamStuckey commented 1 year ago

TODO - add individual tickets with context for each KPI (DONE)

SamStuckey commented 1 year ago

TODO - clean up individual ticket context. (DONE)