Open jilladams opened 2 weeks ago
Some notes: Latency for response range was: 2071 - 54273ms
I would recommend either: Increasing the response time from the 1000ms. Increase the timeframe we evaluate the test in a failed state to alert. Right now this timeframe is at 5mins.
Either of these would be fairly easy to implement.
Status
[2024-04-25] Asked via Slack if Chris K. can estimate async so that we can pull it into Sprint 3 for Josh to work on.
User Story or Problem Statement
As a product team, I want to get Datadog alarms only if a problem is critical or ongoing, not for every blip.
Description or Additional Context
We own a Datadog synthetic monitor that sends a GET request to the vets-api /v0/forms endpoint every minute, and expects a response within 1000ms: [Synthetics] GET vets-api /v0/forms (prod)
Anytime a response is >1000ms, the alarm monitors. That's not useful. More notes here: https://dsva.slack.com/archives/C05THHJHH2R/p1714069662900159?thread_ts=1714068389.937709&cid=C05THHJHH2R
We want to update the monitor to only alarm if ... (figure out better criteria here, when we refine this ticket)
Acceptance Criteria