DataDog / dd-agent

Datadog Agent Version 5
https://docs.datadoghq.com/
Other
1.3k stars 812 forks source link

More checks for the Kubernetes control plane #3112

Open therc opened 7 years ago

therc commented 7 years ago

Right now, I don't think there's insight on what's going with the Kubernetes control plane:

hkaj commented 7 years ago

Hi @therc Sorry for the delay, looks like we lost track of this one.

Thanks for the report :)

therc commented 7 years ago

@hkaj since I filed the issue, I got the etcd integration working. I do it in the same singleton agent that collects events (or, rather, does not yet, but that's another story). I pass the list of etcd masters in the instances field, since our clusters have masters at fixed, known IP addresses. I understand that most others out there won't have the same luxury. There is a fair number of performance metrics, but one puzzling thing is that I can't find how to track the number of replicas that are up. I only see metrics sharded by the two etcd_states, leader vs follower. I guess this might be due to my single-instance approach and I should just use the daemonset approach, so that each etcd master is reported by a different agent? I'll try that next, but I'll leave this here in the meantime, so that others who try to be too smart can find about the problem in the issue tracker.

therc commented 7 years ago

Nevermind, I see that there is an url tag that is unique for each master... specifically for cases like mine.

so0k commented 6 years ago

I believe DCA was introduced for this - https://github.com/DataDog/datadog-agent/pull/983