CDCgov / prime-reportstream

ReportStream is a public intermediary tool for delivery of data between different parts of the healthcare ecosystem.
https://reportstream.cdc.gov
Creative Commons Zero v1.0 Universal
72 stars 40 forks source link

Implement high priority sender observability needs from external message monitoring report results #16247

Open brandonnava opened 4 weeks ago

brandonnava commented 4 weeks ago

The external message monitoring research report was an effort we undertook to understand the message observabilty and monitoring needs of our various senders and what we can do to support those needs similar to what we did with our internal message monitoring efforts beforehand. Sections like 'Which scenarios matter for our sender audience?' and 'What types of info do they need during those scenarios?' reveal a few themes that are then distilled in the 'Insights' section and form the basis for a list of possible recommendations at the end of the report. This epic is to track the highest priority recommendations we plan to go after first. Those three are:

1) Provide some form of ACKs/NACKs messaging to senders by first investigating if LOI ACKs meets sender observeability needs or if further sender notification is needed 2) Train the support team on handling message errors so they can respond with what to do to the sender without needed to escalate to Engagement engineers, freeing up engineering capacity 3) Refine and clarify Submission History API responses and documentation especially for handling errors. This increases the chances senders can self serve remedying errors their messages are getting during onboarding and troubleshooting that can arise as part of regularly submitting results.

jsutantio commented 15 hours ago

The following message was provided to the support team on Oct 1, 2024:

As result of the external message monitoring research, one of the next steps was to grant y'all more access and visibility into the tools that the Engagement Engineers use. The ultimate goal is to empower the support team with the necessary resources and training to reduce the quantity of tickets that need to be escalated to the Eng Engs (who are totally swamped these upcoming quarters). This will start with:

  1. Getting the support team access to Azure app insights, which I believe grants you greater visibility and traceability of messages than Metabase. There may be delays in getting access as we go through our new DevOps team and them going through CDC.
  2. Attending a training demo with an Eng Eng where they'll walk y'all through Azure app insights and typical usage scenarios.
  3. Identifying specific issue categories that typically require tier 2 escalation that can be reduced to tier 1 through trainings with an Eng Eng and adjusting the SOP/ support guide. My guess are issues categorized under Data-Validation, Confirmation-Verification, and Data-Resubmission as those are the most frequent categories so far.