draganm / missing-container-metrics

Prometheus exporter for container metrics cAdvisor won't give you
MIT License
171 stars 32 forks source link

container_ooms time series are being exported infinitely even when container doesn't exists anymore #13

Open boriba opened 3 years ago

boriba commented 3 years ago

appVersion: 0.21.0

Lets say we have pod "pod1" and in this pod main container. We have one occurrence of OOM Kill in main container for pod1 and after pod is restarted it is successfully running.

We get these container_ooms metric exported as time series: container_ooms{pod="pod1", container_id="old_container_id_oom_killed"} 1 container_ooms{pod="pod1", container_id="new_container_id_successfully_running"} 0

Problem is that container with container_id="old_container_id_oom_killed" doesn't exists anymore, but time series for this container still exists infinitely with value of 1.

When we define alert to trigger on when container_ooms > 0, then this alert will be triggered infinitely even when pod was successfully restarted and is already successfully running the main container.

When I delete pod "pod1", also all time series for this specific pod are removed after 5 minutes (which is expected).

Expected behavior: When pod is successfully running and container with container_id in which OOM Kill occurred doesn't exists anymore, then time series for this container_id should be removed after 5 minutes (so exporter should NOT export the metric for this non-existent container anymore).