aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.59k stars 915 forks source link

Karpenter metrics documentation should be improved #6551

Closed hitsub2 closed 1 week ago

hitsub2 commented 1 month ago

Description

How can the docs be improved? For examples, the following metrics documentation should be improved.

1.karpenter_nodes_termination_time_seconds

Examples

karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="celeborn-worker-graviton", quantile="0.9", service="karpenter"} 0.229906829
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="celeborn-worker-graviton", quantile="0.99", service="karpenter"} 0.229906829
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="celeborn-worker-graviton", quantile="1", service="karpenter"} 0.229906829
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-compute-optimized", quantile="0", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-compute-optimized", quantile="0.5", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-compute-optimized", quantile="0.9", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-compute-optimized", quantile="0.99", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-compute-optimized", quantile="1", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-memory-optimized", quantile="0", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-memory-optimized", quantile="0.5", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-memory-optimized", quantile="0.9", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-memory-optimized", quantile="0.99", service="karpenter"} NaN
karpenter_nodes_termination_time_seconds{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-memory-optimized", quantile="1", service="karpenter"} NaN

Improvemen

A.what’s the quantile?

B.how can we know the node info?

2.karpenter_nodes_leases_deleted

Improvement

what's the leases?

3.Provisioner Metrics related metric are lost after migrated to nodepool

nodepool lacks of metric like karpenter_provisioner_scheduling_simulation_duration_seconds and karpenter_provisioner_scheduling_duration_seconds.

4.karpenter_nodeclaims_terminated

karpenter_nodeclaims_terminated{app_kubernetes_io_instance="karpenter", app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="karpenter", app_kubernetes_io_version="0.36.0", helm_sh_chart="karpenter-0.36.0", instance="100.64.81.229:8000", job="kubernetes-service-endpoints", kubernetes_namespace="karpenter", nodepool="spark-memory-optimized", reason="insufficient_capacity", service="karpenter"} 1

Improvemen

what’s the reason for insufficient_capacity?

5.disruption metrics

karpenter_nodeclaims_disrupted

Improvemen

what’s disruption?

6.karpenter_interruption_received_messages

Improvemen

how many reasons for interruption(message type) for interruption?

engedaam commented 1 month ago

A.what’s the quantile?

B.how can we know the node info?

3.Provisioner Metrics related metric are lost after migrated to nodepool nodepool lacks of metric like karpenter_provisioner_scheduling_simulation_duration_seconds and karpenter_provisioner_scheduling_duration_seconds.

B. what's the leases?

C. what’s disruption?

D. what’s the reason for insufficient_capacity?

github-actions[bot] commented 3 weeks ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.