Closed gizas closed 2 months ago
I imagine that the accuracy of this value will be hard to pinpoint, but even an approx time would be great ... from looking at the data, it's useful to know if an error (such as OOMKilled) happened 7 mins or 7 hours or 7 days ago.
Possible fields that can support that (from KSM-Pod metrics) can be
kube_pod_status_container_ready_time
.
I just tried using that field (https://github.com/elastic/beats/pull/37192), and unfortunately, they seem to come in different events, leading to separate Metricbeat entries :(
We need to look at any other potential fields or have Metricbeat somehow generate it for us.
It looks like there's a request in the kubernetes repo to log the event whenever there's an OOMKilled: https://github.com/kubernetes/kubernetes/issues/69676
I just tried using that field (https://github.com/elastic/beats/pull/37192), and unfortunately, they seem to come in different events, leading to separate Metricbeat entries :(
kube_pod_status_container_ready_time
identify the time when the Readiness probe was successful and the container is ready to a accept connections
kube_pod_completion_time
Completion time in unix timestamp for a pod. - from my understanding it is mainly used for the jobs
I've noticed there was created another issue to add
kube_pod_completion_time
for the same reason, but I believe this metric does not represent what we need - https://github.com/elastic/beats/issues/37206#issuecomment-1870506700
kube_pod_container_state_started
Start time in unix timestamp for a pod container (reffers to containerStatus.State.Terminated.StartedAt
)
kube_pod_created
Unix creation timestamp of the pod (refers to v1.Pod.CreationTimestamp
)
kube_pod_deletion_timestamp
Unix deletion timestamp (refers to v1.Pod.DeletionTimestamp
)
kube_pod_start_time
Start time in unix timestamp for a pod. (refers to v1.Pod.Status.StartTime
)
kube_pod_status_ready_time
- Readiness achieved time in unix timestamp for a pod. (v1.Pod.Status.Conditions, reported if pod condition of type type: Ready
is status: "True"
)
kube_pod_status_initialized_time
Initialized time in unix timestamp for a pod. (v1.Pod.Status.Conditions, reported if pod condition of type type: Initialized
is status: "True"
)
kube_pod_status_container_ready_time
Readiness achieved time in unix timestamp for a pod containers. (v1.Pod.Status.Conditions, reported if pod condition of type type: ContainersReady
is status: "True"
)
kube_pod_status_scheduled_time
Unix timestamp when pod moved into scheduled status (v1.Pod.Status.Conditions, reported if pod condition of type type: PodScheduled
is status: "True"
)
Example of the status.conditions that last 4 metrics above are referring to:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-12-27T16:31:05Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-12-28T10:49:20Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-12-28T10:49:20Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-12-27T16:31:05Z"
status: "True"
type: PodScheduled
=> there is no such metric at the moment that would provide the time that last_terminated_reason of cotainer occurred
What is the interest of this issue is a cs.LastTerminationState.Terminated.FinishedAt
as I understood:
Containers:
kube-scheduler:
...
State: Running
Started: Wed, 27 Dec 2023 17:05:34 +0100
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 27 Dec 2023 13:38:18 +0100
Finished: Wed, 27 Dec 2023 17:05:33 +0100
Ready: True
Restart Count: 1
Here is a PR to report kube_pod_container_status_last_terminated_timestamp
https://github.com/kubernetes/kube-state-metrics/pull/2291
Progress:
Describe the enhancement: Provide the
time
that last_terminated_reason of cotainer occurred in Kubernetes Integration Possible fields that can support that (from KSM-Pod metrics) can be kube_pod_status_container_ready_time.Describe a specific use case for the enhancement or feature: The "kubernetes.container.status.last_terminated_reason" is a useful field, especially in the case of missed metrics. This metric on its own can be difficult to identify when it happened as the container or pod can keep this error since last restart and also can be This request is specifically for a timestamp for when this last_terminated_reason occurred.
What is the definition of done?