Closed ph-One closed 1 year ago
Lee and Kyle are working together to move this forward. We plan to have this issue closed out soon.
@ph-One is evaluating the monitors and dashboard to ensure that AC has been met. Once that's been done, we can close this issue. :)
needs a description
metric should sum by
proxy
and deployment_env
consider if there is a way to use a percentage instead of absolute values for alerting (e.g., do we know the max number of requests a proxy can handle before falling over)
needs a description
metric should sum by
proxy
and deployment_env
consider if there is a way to use a percentage instead of absolute values for alerting (e.g., do we know the max number of requests a proxy can handle before falling over)
metric should sum by proxy and deployment_env
consider if there is a way to use a percentage instead of absolute values for alerting (e.g., do we know the max number of requests a proxy can handle before falling over)
needs a description
metric should sum by
proxy
and deployment_env
consider if there is a way to use a percentage instead of absolute values for alerting (e.g., do we know the max number of requests a proxy can handle before falling over)
needs a description [of what an alert from this monitor means for the person getting the alert]
update alerting chain to send to the #oncall slack channel if this is production, https://vfs.atlassian.net/wiki/spaces/OT/pages/2513633298/Create+Monitors+and+Alerts+in+Datadog#Set-up-the-Final-Alert-Chain-(staging%2C-prod-alerts)
consider removing these monitors. a low number of 2xx return codes could signify a frontend/backend used infrequently and is not necessarily an issue. this would be a good candidate for a widget (informational) but not something that should trigger an alert
https://vagov.ddog-gov.com/monitors/103404 covers 5xx, and https://vagov.ddog-gov.com/monitors/103409 checks for alive backends, which would cover what I believe the intent of the low 2xx's intent is (no traffic, or we are erroring out)
Follow-up work being done as part of https://app.zenhub.com/workspaces/platform-tech-team-2-633efe4ca5a428e5294d7ade/issues/gh/department-of-veterans-affairs/va.gov-team/53440
The forward proxy has been evaluated for monitoring needs, and monitors and alerts have been created. Per Kyle's comment above, there is a follow-up issue for tracking specific feedback. See #53440, which is scheduled for next sprint.
Let's go ahead and close this issue and its parent epic to simplify tracking. I believe the goal has been met, pending the implementation of Kyle's recommendations.
Feel free to re-open this issue if the need arises.
Description
As a platform engineer Datadog monitors and alerts need to be evaluated and/or setup for the Forward Proxys
Acceptance Criteria
Notes