docker-archive / infra-container_exporter

Prometheus exporter exposing container metrics
126 stars 43 forks source link

Memory issues #17

Closed olvesh closed 7 years ago

olvesh commented 8 years ago

It seems the container-exporter used in a docker-container (Docker 1.9.1) is using much more mem than its cousins.

Query: avg(container_memory_usage_bytes{name=~"prom.*"}  / 1024 / 1024) by (name)
Element                                 Value
{name="prometheus-consul-exporter"}     10.352864583333334
{name="prometheus-haproxy-exporter"}    11.6328125
{name="prometheus-container-exporter"}  325.6673828125
{name="prometheus-node-exporter"}       13.4107421875
{name="prometheus-statsd-exporter"}     10.933203125
{name="prometheus"}                     5294.07421875
CONTAINER                       CPU %               MEM USAGE / LIMIT     MEM %               NET I/O               BLOCK I/O
prometheus-container-exporter   0.00%               343.7 MB / 8.372 GB   4.11%               12.43 MB / 123.6 MB   4.1 MB / 0 B
prometheus-node-exporter        0.00%               15.16 MB / 8.372 GB   0.18%               0 B / 0 B             4.305 MB / 0 B
prometheus-statsd-exporter      0.18%               11.92 MB / 8.372 GB   0.14%               1.243 GB / 68.41 MB   5.325 MB / 0 B

Uptime here is 10 days for all prometheus-containers, and scrape time of 60 sec.

I will try using cAdvisor instead of the container-exporter, but logging here in case someone takes up the maintenance of this repo.

olvesh commented 8 years ago

The cause of the seems to be not closing sockets properly. At least lsof reports:

$ wc -l container-exporter.lsof
51240 container-exporter.lsof

open files when using 188.6 MB.

From attatched file: container-exporter.lsof.gz

COMMAND     PID   TID       USER   FD      TYPE    DEVICE SIZE/OFF      NODE NAME
[...]
container  8398             root    9u     sock                0,7       0t0   73789030 can't identify protocol
container  8398             root   10u     sock                0,7       0t0   73792299 can't identify protocol
container  8398             root   11u     sock                0,7       0t0   73795796 can't identify protocol
container  8398             root   12u     sock                0,7       0t0   73798629 can't identify protocol
[...]

What is also weird is that this number is not covered by the metric the container should report: process_open_fds{instance="10.20.20.244",job="prometheus-container-exporter"} 8535 although it is still high.

KekSfabrik commented 8 years ago

:+1: i can confirm that - just came back to work from the weekend (everything running for 2 days) and prometheus had died and wouldn't start up again (too many files open).. from a 236MB lsof output:

root@core:~# cat lsof_result_22022016 | awk '{ print $2; }' | uniq -c | sort -rn | head
2304120 970 (< docker daemon)
  75369 18943 (< container exporter)
... and after restarting container exporter ...
  46200 970 (< docker daemon)

besides it showed up in htop as using ~1.2% of the memory of the VMs 30GB.

MrMMorris commented 8 years ago

seeing this as well. Using 2.6GB of memory on one of my hosts.

PID     USER   PR   NI    VIRT       RES       SHR S   %CPU   %MEM   TIME+       COMMAND
10037   root   20   0     2775208    2.634g   3152 S   2.3    17.6   364:23.33  container-exporter

I think this actually resulted in some downtime with my Consul cluster, so not good at all

olvesh commented 8 years ago

I am going to switch with cAdvisor from coreos, but that triggered another docker bug of course… Since it mounts / I get some mount errors when other containers are destroyed. But the metrics are interchangeable.

olvesh commented 7 years ago

Closing since repo is deprecated.