Closed ph-One closed 1 year ago
Note: This issue will carry into the next sprint. We've made significant progress, but there are several refinements needed to ensure comprehensive monitoring for EKS (Clusters & Ingress).
Also, we consulted w/ Platform Tech Team 1 today to inform them of the monitoring that's been put in place for EKS. They are migrating Vets-API to EKS and will be adding monitors for parts of the application that are not yet covered.
coredns
kubelet
Re: Kube proxy, I'm not sure if this is needed, but you might take a look: https://github.com/DataDog/integrations-core/blob/master/kube_proxy/README.md
https://vagov.ddog-gov.com/monitors/115057
from
https://vagov.ddog-gov.com/monitors/115078
noerror
if it doesn't mean an errorhttps://vagov.ddog-gov.com/monitors/115068
https://vagov.ddog-gov.com/monitors/115081
https://vagov.ddog-gov.com/monitors/115082
https://vagov.ddog-gov.com/monitors/115086
kubelet
--operation-duration-history-seconds
in the kubelet config file that controls the sum
metric; and there should be a metric aggregation interval
in the datadog agent config file that sets a scraping interval. I am not sure how to access either of these files to check, though.This issue is nearly done but will stay open into the next sprint so that we can get it properly reviewed.
Multiple team members have been out of the office recently, which has impacted our ability to complete this.
But now that @ph-One is back, he should be able to review and provide feedback, and then we'll get this over the line.
Not quite complete, rolling into next sprint
The Kublet duration is not making sense and needs to be investigated further. Potentially lets spin up another ticket.
Closing this ticket and breaking out the latency monitor into its own ticket.
Description
As a platform engineer Datadog monitors and alerts need to be evaluated and/or setup for EKS:Clusters
Acceptance Criteria
Notes