The following features needs to be evaluated to determine if it meets the standards for 'zero silent failures', which is a user-facing transaction that is submitted to the back-end system.
OCTODE guidance states:
Does your application have a user-facing transaction that is submitted to a back-end system?
NOTE: This is not limited to online forms! Other examples can include:
Uploads of documents and/or attachments
Performing an action (Such as refilling a prescription or ordering supplies)
Are you using any of the listed APIs?
Lighthouse Appeals Status
Lighthouse Benefits Documents API
Lighthouse Benefits Intake API / Central Mail
Lighthouse Decision Reviews
EVSS Document Upload
Does your application submit to an API that relies on Sidekiq (or another background job processor)?
[x] Do you know when your application shipped to production?
If not, use Github to determine, roughly, when your application shipped to users.
Answer
Facility status: October 22, 2021
Facility services: May 30, 2023
[x] Did your application use the same APIs when it shipped as it does today?
If not, then you'll need to consider the path user data took through both the current architecture and the previous architecture. You will need to account for potential failures in all paths since your application shipped.
Answer
Yes, it uses the same API but it uses a new version of the API (v1, instead of v0). Issues that arose during the transition have been resolved.
Monitoring
[ ] Do you monitor the API that you submit to via Datadog?
N/A (see above)
⚠️ Failure to have endpoint monitoring in place is a blocking QA standard at Staging review as of 9/10/24. If you answered no to any of the questions above, you will be blocked from shipping at the Staging review touchpoint in Collab Cycle.
Answer
😱
Reporting errors
[ ] Have you filed issues for errors that are appearing in Datadog / Slack?
If not, then start filing Github issues for new categories of errors following this guidance
[ ] Do all fatal errors thrown in your application end up visible to the end user either in the user interface or via email?
If not, then file Github issues to capture error categories following this guidance
Documentation
[ ] Do you have a diagram of the submission path that user data your application accepts takes to reach a system of record?
[ ] Do you understand how the error is handled when each system in the submission path fails, is down for maintenance, or is completely down?
If not, then create documentation that captures how errors in each system are handled. Detail which systems retry a submission and what happens when those retries exhaust. Show this in your diagram.
[ ] Has the owner of the system of record receiving the user's data indicated in writing that their system notifies or resolves 100% of fatal errors once in their custody?
If not, work with OCTO to meet with the owner of the system and get their agreement in writing.
Please document the outcome of this conversation in your product's documentation in Github.
User experience
[ ] Do you capture all of the potential points of failure and make those errors known to the user via email notification and/or through the application on VA.gov or the mobile application?
If not, don't worry. Few teams are doing this and we'll be providing resources to help you do this in your application. Proceed to create a user data flow diagram. That diagram will help us to help you and your team to create this user experience.
[ ] Create a user data flow diagram
Creating a user data flow diagram is a requirement of the Zero silent errors initiative and will be a required asset at the Architecture Intent touchpoint of the Engineering and Security track of Collaboration Cycle.
We want to know about your silent errors so that we can help you to fix them. To do this, follow the process in the Managing Errors document.
We don't have any silent errors!
Great! Please let us know that you went through the checklist above as a team and did not find any silent failures in our Slack channel: #zero-silent-failures. You don't have to hang out in there once you have notified us. Just pop in, tell us who you are (which team and in which portfolio) and that no failures were found. Thanks!
Description
The following features needs to be evaluated to determine if it meets the standards for 'zero silent failures', which is a user-facing transaction that is submitted to the back-end system.
OCTODE guidance states:
Problem Statement:
Artifacts
User story
AS A I WANT SO THAT
Engineering notes / background
If you need to set up monitoring in DataDog:
Set up monitoring in Datadog
Follow this guidance on endpoint monitoring to get going. Then following the guidance on monitoring performance to get up to speed with Datadog.
Examples
Additional examples
Analytics considerations
Quality / testing notes
Acceptance criteria
Checklist
Start
Answer
Answer
Monitoring
[ ] Do you monitor the API that you submit to via Datadog?
Answer
[ ] Does your Datadog monitoring use the appropriate tagging?
Answer
[ ] Do errors detected by Datadog go into a Slack notifications channel?
Answer
[ ] Does more than one person look at the Slack notifications channel containing errors on a daily basis?
Answer
[ ] Do the team members monitoring the Slack channel have a system for acknowledging and responding to the errors that appear there?
Answer
Answer
Reporting errors
[ ] Have you filed issues for errors that are appearing in Datadog / Slack?
[ ] Do all fatal errors thrown in your application end up visible to the end user either in the user interface or via email?
Documentation
User experience
[ ] Do you capture all of the potential points of failure and make those errors known to the user via email notification and/or through the application on VA.gov or the mobile application?
[ ] Create a user data flow diagram
Creating a user data flow diagram is a requirement of the Zero silent errors initiative and will be a required asset at the Architecture Intent touchpoint of the Engineering and Security track of Collaboration Cycle.
Learn how to create a user data flow diagram
File silent errors issues in Github
We want to know about your silent errors so that we can help you to fix them. To do this, follow the process in the Managing Errors document.
We don't have any silent errors!
Great! Please let us know that you went through the checklist above as a team and did not find any silent failures in our Slack channel: #zero-silent-failures. You don't have to hang out in there once you have notified us. Just pop in, tell us who you are (which team and in which portfolio) and that no failures were found. Thanks!