aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.88k stars 969 forks source link

Metric for Node Launch/Startup time #7171

Open dumlutimuralp opened 1 month ago

dumlutimuralp commented 1 month ago

Description

What problem are you trying to solve?

Organizations would like to have a metric that shows the time it takes for a node to boot and get into ready state in Kubernetes. Currently Karpenter Prometheus metrics show karpenter_pods_startup_duration_second only.

How important is this feature to you?

Having highly dynamic workloads and multi tenant environments node startup time becomes an important metric that impacts multiple dimensions. In scaling events it is crucial for organizations to be able to measure the overall scaling behavior based on this metric.

njtran commented 1 month ago

This should be captured by the metrics for the NodeClaim Initialization status condition. You can get different details around when this status condition is added here: https://karpenter.sh/docs/reference/metrics/#status-condition-metrics

TLDR is that a node is ready for pod scheduling and use once the NodeClaim for a Karpenter node has the status condition Initialized = true. More info on what Launched/Registered/Initialized is here https://karpenter.sh/docs/concepts/nodeclaims/

youwalther65 commented 1 month ago

@dumlutimuralp It seems one can use the metric operator_status_condition_current_status_seconds:

operator_status_condition_current_status_seconds{kind="NodeClaim", name="<node claim name>",status="True", type="Launched"} - ignoring(type,reason) operator_status_condition_current_status_seconds{kind="NodeClaim", name="<node claim name>",status="True", type="Ready"}

Comparing this number, which is seconds from Launched to Ready with Karpenter logs or nodeclaim.status section it seems it really shows the right values.