Closed pacerwow closed 1 month ago
In this thread, please link me to any document or artifact you've worked on related to submission flow, failure/error cases. Even if they're out of date or partial, they'll help me as I take a first pass at the checklist for zero silent failures. I have a few already but am just casting a wide net! more is better! looking for your pile of links before EOD today. (edited)
Sam Stuckey
features that support / power the safety net:
Remediation era docs
Kyle Soskin doc from a while ago of stuff I thought should be monitored, and how, not sure if any of it got made I feel like I probably had a lot of other stuff, but will have to dig around for it
Aurora Hampton
non-diagram things
Nathan Burgess I believe all I had were the two diagrams in the bottom left of this page which include my initial discovery of where I believed the silent failures were, plus a draft of a "state machine" that could track the status of all of the ancillary documents, but our understanding of the failure points may have changed in the meantime, this was back in the beginning of the year when I first noticed where things could be slipping through the cracks mural.comural.co 526 Claim Submission Migration - Polling (32 kB) So that would be "(Current) Ancillary Jobs Flow, Retry and Fail State Diagram" and "(Future) Ancillary Actions State Machine"
Scott Veteran Document Upload Silent Failure Discovery (original draft– numbers and process do not reflect the most-recent findings and methodology, to be documented this sprint)
I moved this checklist into a markdown file in the sensitive repo since it will need to live on beyond the lifecycle of this ticket:
Issue Description
As a DBEX team member I want to complete the Zero Silent Failures (ZSF) checklist (below) to ensure that We provide the very best Veteran experience. The activities listed below will be completed and any future product, design, or engineering work needed to address ZSF is identified and tickets are created for this work with clear action items.
Tasks
Acceptance Criteria
Additional Information
Text-based documentation is saved in products/disability/526ez
Checklist
Start
Monitoring
⚠️ Failure to have endpoint monitoring in place is a blocking QA standard at Staging review as of 9/10/24. If you answered no to any of the questions above, you will be blocked from shipping at the Staging review touchpoint in Collab Cycle.
Reporting errors
Documentation
User experience
We don't have any silent errors!
Great! Please let us know that you went through the checklist above as a team and did not find any silent failures in our Slack channel: #zero-silent-failures and send us a link of a copy of this completed checklist. If you don't connect to a backend system, you don't need to fillout the checklist but let us know in your message. You don't have to hang out in there once you have notified us. Just pop in, tell us who you are (which team and in which portfolio) and that no failures were found. Thanks!
Additional details
Set up monitoring in Datadog
Follow this guidance on endpoint monitoring to get going. Then following the guidance on monitoring performance to get up to speed with Datadog.
Examples
Additional examples
File silent errors issues in Github
We want to know about your silent errors so that we can help you to fix them. To do this, follow the process in the Managing Errors document.
How to create a user data flow diagram
Creating a user data flow diagram is a requirement of the Zero silent errors initiative and will be a required asset at the Architecture Intent touchpoint of the Engineering and Security track of Collaboration Cycle.
Learn how to create a user data flow diagram