deis / monitor

Monitoring for Deis Workflow
https://deis.com
MIT License
22 stars 32 forks source link

Telegraf seems to have a memory leak #111

Closed mattk42 closed 8 years ago

mattk42 commented 8 years ago

In my cluster running the 2.0 official release I have noticed that telegraf works its way up to 2GB of memory usage before it seemingly gets cleaned up and starts over.

Memory Leak

jchauncey commented 8 years ago

Are you noticing the pod restarting a lot? There was an issue in the 0.13.x version of telegraf that caused the binary to panic when sending data to influx - https://github.com/influxdata/telegraf/issues/1268

Check the pod logs and see if you see a similar erorr message.

mattk42 commented 8 years ago

Unfortunately I actually killed off the DS, I have my own monitoring stuff in place so I shut down most of the deis-monitor components this morning.

I don't believe that the container is getting restarted though, if that was the case the container ids in the chart above would have changed.

titilambert commented 8 years ago

Hello, I think I got the same issue. I suspect prometheus plugin. It seems it doesn't closing connection :/ Run on you apiservers: netstat -ntp | grep 8080 | wc -l

jchauncey commented 8 years ago

So we have seen this happen (especially on larger clusters) so we disabled the prometheus plugin by default in the image (although the chart turns it on). This means you will lose out on k8s metrics and container metrics. I will open an issue with telegraf and see if we can get it fixed.

jchauncey commented 8 years ago

See here - https://github.com/influxdata/telegraf/issues/1405

titilambert commented 8 years ago

PR influxdata/telegraf#1406 created

jchauncey commented 8 years ago

So I have rebuilt the image to include latest master changes. It seems to have fixed the memory leak problem but Im not 100% on that. If you want to redeploy telegraf and check it out that would be awesome.

titilambert commented 8 years ago

@jchauncey testing today or tomorrow. I let you know when I get results ;)

titilambert commented 8 years ago

The issue is fix for me !