Open swirtSJW opened 2 years ago
Completed with the addition of this UX monitoring browser test in datadog:
Noting:
This original monitor was among the first of its kind for Sitewide, before Code Yellow / Watch officer existed, and before we had teams / dashboards in Facilities, so this monitor alarmed to the #oncall channel in DSVA slack for Platform response, and Facilities team was not responsible for triage afaik.
The ticket here was to let us know if Veterans don't get Facility Locator response as expected from a browser test. We now have a few other monitors on Facility Locator API endpoints that will alert us to anomalies in traffic. We don't have any other browser synthetic test.
It might be worth reviving this and creating a similar new monitor if Plat can't help us surface the old one / revise it. Plat thread opened here about the fact it's gone now: https://dsva.slack.com/archives/CBU0KDSB1/p1705532707740479
FYI @xiongjaneg
From Plat:
the resources pointed to by this url do not exist. You can go ahead and recreate it.
Reopening to track in the backlog.
Noting other resources are available from the datadog channel, Adrian Rollett, etc.
Please add your planning poker estimate with Zenhub @eselkin
I've tried creating a synthetic test for facility locator and it doesn't work. The browser synthetic test can no longer load WebGL which it needs for facility locator.
Noted you'd flagged that concern before, when this came up in refinement today. @eselkin I'd love to get a clear sense of what is different now than when the original synthetic monitor was created / worked. I don't not believe you (double neg?) just we need to sort out how it worked before / doesn't now, and if there's something Datadog owners could enable that would unblock it, or if WebGL is needed, there's potential we could file this as a feature request with Datadog on behalf of VA as well, etc. Any screenshots or something of what happens when we try could probably help with sorting that out.
@xiongjaneg two additional notes from refinement which may need to be reflected in AC
Please add your planning poker estimate with Zenhub @maxx1128
@jilladams I created a synthetic test here when we were noticing issues: https://vagov.ddog-gov.com/synthetics/details/apx-2wv-n92?from_ts=1705615225518&to_ts=1706220025518&live=true
@mmiddaugh Name of the monitor I created is [Facilties] Facility Locator
You can see it says "PASSED" (I tried running the test will all browsers but all had the same issue)
but the screenshot and error messages tell everything at the bottom of the page. The screenshot shows no loaded Facility Locator because of the errors.
The errors show:
@jilladams @eselkin This doesn't have points or a sprint assigned. Is this actually something that should be considered for Sprint 4?
Same here: weird workflow problem, not sure what happened. Moving to backlog.
Noting: today the Facility Locator experienced a spike in 403 errors from the new Facilities-api v2 endpoint: https://vagov.ddog-gov.com/apm/services/facility-locator/operations/rack.request/resources?dependencyMap=qson%3A%28data%3A%28telemetrySelection%3Aall_sources%29%2Cversion%3A%210%29&env=eks-prod&fromUser=true&groupMapByOperation=null&panels=qson%3A%28data%3A%28%29%2Cversion%3A%210%29&resources=qson%3A%28data%3A%28visible%3A%21t%2Chits%3A%28selected%3Atotal%29%2Cerrors%3A%28selected%3Atotal%29%2Clatency%3A%28selected%3Ap95%29%2CtopN%3A%215%29%2Cversion%3A%211%29&summary=qson%3A%28data%3A%28visible%3A%21t%2Cerrors%3A%28selected%3Acount%29%2Chits%3A%28selected%3Acount%29%2Clatency%3A%28selected%3Alatency%2Cslot%3A%28agg%3A75%29%2Cdistribution%3A%28isLogScale%3A%21f%29%2CshowTraceOutliers%3A%21f%29%2Csublayer%3A%28slot%3A%28layers%3Aservice%29%2Cselected%3Apercentage%29%29%2Cversion%3A%211%29&view=spans&start=1716924577835&end=1716926350000&paused=true
We are still triaging, and not clear exactly what happened / is happening here, but we should try to prioritize Facility Locator monitoring potentially, pending the results of what we find today.
User Story or Problem Statement
As as maintainers of the facility locator, I want synthetic monitoring of the Facility Locator to make sure we are alerted if Veterans are not getting results for their query..
Recommended as a result of Postmortem
Acceptance Criteria