deis / monitor

Monitoring for Deis Workflow
https://deis.com
MIT License
22 stars 32 forks source link

deis-monitor-telgraf is opening up too many connections on kube-apiserver #118

Closed jolly2 closed 8 years ago

jolly2 commented 8 years ago

Have 3 node cluster Ubuntu/Kubernetes/Deis-Workflow-v2.0.0 Each node has good compute resources..

Withing 10-15 mins of fresh installation of deis workflow, k8s dashboard and k8s apis hang up. After investing, found out that deis-monitor-telegraf (3 pods) were opening up a lot of connections to kube-api server to collect metrics but not closing them. Continuous, lsof -P -p showed opened sockets increasing continuously until it hit 1024 (this is the default ulimit), at which point, kube-apiserver started logging "too many open files" on 6443 and 8080 ports and the entire system hangs. I tried increasing the ulimit but it did not help. I shutdown deis-monitor-telegraf daemon set and lsof came to its stable minimal value. Tried both v2 and v2-beta versions of telegraf image. Both have the same problem.

Is there a way to collect the metrics on demand instead of collecting them by auto polling?

Thanks!

jchauncey commented 8 years ago

This was an issue with an older version of telegraf and should be fixed when 2.1 ships.

jchauncey commented 8 years ago

closing as this was fixed in 2.1