Open therc opened 7 years ago
Hi @therc Sorry for the delay, looks like we lost track of this one.
for etcd monitoring we recommend deploying the agent on etcd nodes and running the etcd check. It offers thorough coverage of etcd (issues/PRs are welcome if you feel something is missing)
I'm adding a task in our backlog to improve coverage of the k8s control plane
Thanks for the report :)
@hkaj since I filed the issue, I got the etcd integration working. I do it in the same singleton agent that collects events (or, rather, does not yet, but that's another story). I pass the list of etcd masters in the instances field, since our clusters have masters at fixed, known IP addresses. I understand that most others out there won't have the same luxury. There is a fair number of performance metrics, but one puzzling thing is that I can't find how to track the number of replicas that are up. I only see metrics sharded by the two etcd_states, leader vs follower. I guess this might be due to my single-instance approach and I should just use the daemonset approach, so that each etcd master is reported by a different agent? I'll try that next, but I'll leave this here in the meantime, so that others who try to be too smart can find about the problem in the issue tracker.
Nevermind, I see that there is an url tag that is unique for each master... specifically for cases like mine.
I believe DCA was introduced for this - https://github.com/DataDog/datadog-agent/pull/983
Right now, I don't think there's insight on what's going with the Kubernetes control plane: