As an OCTO principal with responsibilities to report metrics on application performance,
I want to be aware of all cases where Benefits Portfolio systems result in a silent failure, or require that a veteran is notified of a system failure.
So that** we can focus resources on addressing issue, ensure that teams are responding to silent failure cases by having a metric available in Datadog, built in a way that minimizes complexity for the implementing team.
New Feature
In all cases where a Decision Reviews system operations result in a silent failure write out a Datadog metric which captures the issue
silent failure is defined as any case where a veteran is not aware that a request made by them was not completed.
the service tag value should be the application's name from the Datadog Service Catalog list of services.
the function tag value should be selected by the team and clearly represent the application function that had the silent failure. This should be the original function that failed (such as "lighthouse evidence upload" or "appeal submission", not a failed notification step ("VANotify").
we anticipate there are two cases where this scenario would happen:
the system has a silent failure issue and hasn't been updated to notify the Veteran of the issue.
the system had a silent failure and the attempt to contact the Veteran to notify them of the issue failed.
In all cases where a BMT system operation would have resulted in a silent failure, but that silent failure was avoided by notifying the Veteran of the issue, write out a Datadog metric which captures the avoided issue
The service and function tags should follow the guidelines listed above.
If there are cases where we cannot write these metric note these cases with the ticket creator & product owner. An example of this would be a case where the team only received notification of errors in email, or where an API used by the system does not provide a success/failure response that matches this goal.
Outcome, Success Measure, KPI(S), and Tracking Link
All cases of silent failures, and cases where silent failures were avoided by sending the Veteran a notification of a failure, are tracked with the metrics listed above.
All metrics are properly tagged for application and function.
Design
Add here
Enablement team (if needed)
@va-albers
Product Owner
@amylai-va
Engineering
Add here
Out of scope
Add here
Open questions
Add here
Tasks
[ ] Task
[ ] Task
[ ] Task
Definition of Done
[ ] Meets acceptance criteria
[ ] Passed E2E testing (90% coverage)
[ ] Passed unit testing (90% coverage)
[ ] Passed integration testing (if applicable)
[ ] Code reviewed (internal)
[ ] Submitted to staging
[ ] Team approved production verification process
[ ] Design performs design QA and verifies the implementation matches the design spec
[ ] Accessibility specialist performs accessibility review (in code or design)
[ ] Engineering identifies staging users required to test and shares account and credentials with design and product
[ ] Product performs functional QA and verifies acceptance criteria was met
Value Statement
As an OCTO principal with responsibilities to report metrics on application performance, I want to be aware of all cases where Benefits Portfolio systems result in a silent failure, or require that a veteran is notified of a system failure. So that** we can focus resources on addressing issue, ensure that teams are responding to silent failure cases by having a metric available in Datadog, built in a way that minimizes complexity for the implementing team.
New Feature
In all cases where a Decision Reviews system operations result in a silent failure write out a Datadog metric which captures the issue
In all cases where a BMT system operation would have resulted in a silent failure, but that silent failure was avoided by notifying the Veteran of the issue, write out a Datadog metric which captures the avoided issue
If there are cases where we cannot write these metric note these cases with the ticket creator & product owner. An example of this would be a case where the team only received notification of errors in email, or where an API used by the system does not provide a success/failure response that matches this goal.
Outcome, Success Measure, KPI(S), and Tracking Link
Design
Enablement team (if needed)
@va-albers
Product Owner
@amylai-va
Engineering
Out of scope
Open questions
Tasks
Definition of Done
Acceptance Criteria