Lets say we have pod "pod1" and in this pod main container. We have one occurrence of OOM Kill in main container for pod1 and after pod is restarted it is successfully running.
We get these container_ooms metric exported as time series:
container_ooms{pod="pod1", container_id="old_container_id_oom_killed"} 1
container_ooms{pod="pod1", container_id="new_container_id_successfully_running"} 0
Problem is that container with container_id="old_container_id_oom_killed" doesn't exists anymore, but time series for this container still exists infinitely with value of 1.
When we define alert to trigger on when container_ooms > 0, then this alert will be triggered infinitely even when pod was successfully restarted and is already successfully running the main container.
When I delete pod "pod1", also all time series for this specific pod are removed after 5 minutes (which is expected).
Expected behavior:
When pod is successfully running and container with container_id in which OOM Kill occurred doesn't exists anymore, then time series for this container_id should be removed after 5 minutes (so exporter should NOT export the metric for this non-existent container anymore).
appVersion: 0.21.0
Lets say we have pod "pod1" and in this pod main container. We have one occurrence of OOM Kill in main container for pod1 and after pod is restarted it is successfully running.
We get these container_ooms metric exported as time series: container_ooms{pod="pod1", container_id="old_container_id_oom_killed"} 1 container_ooms{pod="pod1", container_id="new_container_id_successfully_running"} 0
Problem is that container with container_id="old_container_id_oom_killed" doesn't exists anymore, but time series for this container still exists infinitely with value of 1.
When we define alert to trigger on when container_ooms > 0, then this alert will be triggered infinitely even when pod was successfully restarted and is already successfully running the main container.
When I delete pod "pod1", also all time series for this specific pod are removed after 5 minutes (which is expected).
Expected behavior: When pod is successfully running and container with container_id in which OOM Kill occurred doesn't exists anymore, then time series for this container_id should be removed after 5 minutes (so exporter should NOT export the metric for this non-existent container anymore).