draganm / missing-container-metrics

Prometheus exporter for container metrics cAdvisor won't give you
MIT License
171 stars 31 forks source link

OOM Counter Incrementing Incorrectly #10

Open dmitrii-didenko opened 3 years ago

dmitrii-didenko commented 3 years ago

Hi!

Thank you for the project!

There seem to be one weird bug. Here is the description:

  1. I've installed MCM on the kube cluster v1.21.2 with docker runtime

  2. Port-forward one MCM container to check metrics k port-forward monitoring-missingcm-h257l 3001:3001

  3. Connect to some container located on the same node as MCM pod and trigger oom event with the help of stress command:

    stress --vm 1 --vm-bytes 3024M
    stress: info: [389] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
    stress: FAIL: [389] (415) <-- worker 390 got signal 9
    stress: WARN: [389] (417) now reaping child worker processes
    stress: FAIL: [389] (451) failed run completed in 2s

    Please note, we should run the above command several times to reproduce the issue

  4. Check the container_ooms metrics for the above container

Expected result: the container_ooms counter should have value exact the same as number of times the stress command was executed

Actual result: container_ooms is greater than the number of times the stress command was executed. I've got the value 13 even though I run command only 3 times

Additional info: I've checked docker events on the node while reproducing the issue. The number of oom events is matched with the number of stress runs. Also checked /var/log/messages on the node. Result is as expected - the number of oom logs is matched with the number of stress runs.

Any idea what could be wrong here?

draganm commented 3 years ago

Hi, thank you for reporting this. Let's try to get to the bottom of it ...

questions: