As a developer, I have insights into our applications discreet submission actions.
This work is the first of two steps to provide general clarity into our 526 form flow. This step will generate information. In the next step (out of scope for this epic) we will (better) organize that information into actionable data via Sentry Dashboards and DataDog(?)
Current:
We have identified a list of discrete submission actions inside our 526 form flow that represent potential "black holes" for lost information. This problem was first identified in regards to overall 526 form submissions being "lost". A "paper" submission failover solution has been added to reduce our number of failed submissions, but we still have many other discrete un-or-under reported possible points of failure in the 526 app flow.
Future:
We have clear action logging around discrete submission actions. The output of this work will enable us to create more meaningful dashboards.
In scope
Wrap KPIs in logging, send the information to our Log files (Rails.logger)
Out of scope
Organizing this information into actionable data in Sentry / DataDog dashboards.
Hypothesis
If we make these changes, the enhanced visibility into the "under the hood" actions of our 526 app we will create new insights and actionable data, which in turn will allow us to iterate on opportunities including bounce rate investigations, general debugging, application health monitoring, and improved failure notifications.
Definition of done
Each of the KPIs (Key Point of Interest) listed below have been investigated and (if required) reinforced with the appropriate action logging and error handling. Changes and findings are documented:
We want the following Metrics/Data for each KPI action:
User_uuid
Action being performed (Form ID or action description)
Up or Downstream system involved
If a retry-able action
Attempt counter
If will be retried or if it is the final attempt
Success or failure status
Http status from up/downstream service
Http response body if NOT a 200 (successful) response
Any ID that is the result of a creation, update or delete (IDs returned from third-party services, internal VA.gov DB record IDs)
Exception message/stack-trace, if fails outside/before/after external call
Duration
Tips for workflow
The assigned dev should ask themselves;
"What are all the possible things my thing can do?"
"What do I need to know about those things?"
the desired result of this information, once its organized in a dashboard is to be able to look at it and see at a glance, "is my 'thing' healthy?" If the answer is NO, we want every possible bit of data that could help investigate.
All of this (above) historical work, ticketing, and research is either encapsulated by the action items outlined in this epic, or slated for an iteration, e.g. Dashboard refinement.
KPIs
Each of these discrete actions is represented by a ticket in this epic. Each ticket will require at least investigation and documentation, as well as possibly (probably) code enhancements to add appropriate logging.
Product Outline
High Level User Story
This work is the first of two steps to provide general clarity into our 526 form flow. This step will generate information. In the next step (out of scope for this epic) we will (better) organize that information into actionable data via Sentry Dashboards and DataDog(?)
Current:
We have identified a list of discrete submission actions inside our 526 form flow that represent potential "black holes" for lost information. This problem was first identified in regards to overall 526 form submissions being "lost". A "paper" submission failover solution has been added to reduce our number of failed submissions, but we still have many other discrete un-or-under reported possible points of failure in the 526 app flow.
Future:
We have clear action logging around discrete submission actions. The output of this work will enable us to create more meaningful dashboards.
In scope
Wrap KPIs in logging, send the information to our Log files (Rails.logger)
Out of scope
Organizing this information into actionable data in Sentry / DataDog dashboards.
Hypothesis
If we make these changes, the enhanced visibility into the "under the hood" actions of our 526 app we will create new insights and actionable data, which in turn will allow us to iterate on opportunities including bounce rate investigations, general debugging, application health monitoring, and improved failure notifications.
Definition of done
Each of the KPIs (Key Point of Interest) listed below have been investigated and (if required) reinforced with the appropriate action logging and error handling. Changes and findings are documented:
Documentation
Use the following document to track research into each KPI / ticket.
We want the following Metrics/Data for each KPI action:
Tips for workflow
The assigned dev should ask themselves;
the desired result of this information, once its organized in a dashboard is to be able to look at it and see at a glance, "is my 'thing' healthy?" If the answer is NO, we want every possible bit of data that could help investigate.
Historical Context
This Epic Spawned this work which resulted in these two tickets for hand off
It was determined that ticket #2 should / could be done in an iteration after we have a clear target for what we want our sentry dashboards to look like.
Ticket 1 lists several versions of that same dashboard work relative to Sentry and DataDog. That work was determined to be out of scope as documented here Research related to the hand off of the error logging work is documented here which in turn led to the creation of this document for planning the ongoing work
TL;DR
All of this (above) historical work, ticketing, and research is either encapsulated by the action items outlined in this epic, or slated for an iteration, e.g. Dashboard refinement.
KPIs
Each of these discrete actions is represented by a ticket in this epic. Each ticket will require at least investigation and documentation, as well as possibly (probably) code enhancements to add appropriate logging.