Open dumlutimuralp opened 1 month ago
This should be captured by the metrics for the NodeClaim Initialization status condition. You can get different details around when this status condition is added here: https://karpenter.sh/docs/reference/metrics/#status-condition-metrics
TLDR is that a node is ready for pod scheduling and use once the NodeClaim for a Karpenter node has the status condition Initialized = true. More info on what Launched/Registered/Initialized is here https://karpenter.sh/docs/concepts/nodeclaims/
@dumlutimuralp It seems one can use the metric operator_status_condition_current_status_seconds
:
operator_status_condition_current_status_seconds{kind="NodeClaim", name="<node claim name>",status="True", type="Launched"} - ignoring(type,reason) operator_status_condition_current_status_seconds{kind="NodeClaim", name="<node claim name>",status="True", type="Ready"}
Comparing this number, which is seconds from Launched
to Ready
with Karpenter logs or nodeclaim.status section it seems it really shows the right values.
Description
What problem are you trying to solve?
Organizations would like to have a metric that shows the time it takes for a node to boot and get into ready state in Kubernetes. Currently Karpenter Prometheus metrics show karpenter_pods_startup_duration_second only.
How important is this feature to you?
Having highly dynamic workloads and multi tenant environments node startup time becomes an important metric that impacts multiple dimensions. In scaling events it is crucial for organizations to be able to measure the overall scaling behavior based on this metric.