Open JacobHenner opened 4 weeks ago
For spectators: I'm told that #18448 is expected to be included in datadog-agent 7.58. In the meantime, you can continue to ingest metrics from Karpenter>=1.0.0 using the following configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: karpenter
namespace: kube-system
spec:
template:
metadata:
annotations:
ad.datadoghq.com/controller.checks: |
{
"karpenter": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8080/metrics",
"extra_metrics": [
{
"karpenter_nodes_termination_duration_seconds": "nodes.termination.time_seconds"
},
{
"karpenter_pods_startup_duration_seconds": "pods.startup.time_seconds"
},
{
"karpenter_voluntary_disruption_queue_failures": "disruption.replacement.nodeclaim.failures"
},
{
"karpenter_voluntary_disruption_decision_evaluation_duration_seconds": "disruption.evaluation.duration_seconds"
},
{
"karpenter_voluntary_disruption_eligible_nodes": "disruption.eligible_nodes"
},
{
"karpenter_voluntary_disruption_consolidation_timeouts": "disruption.consolidation_timeouts"
},
{
"karpenter_nodepools_allowed_disruptions": "disruption.budgets.allowed_disruptions"
},
{
"karpenter_voluntary_disruption_decisions": "disruption.actions_performed"
},
{
"karpenter_scheduler_scheduling_duration_seconds": "provisioner.scheduling.simulation.duration_seconds"
},
{
"karpenter_scheduler_queue_depth": "provisioner.scheduling.queue_depth"
},
{
"karpenter_interruption_message_queue_duration_seconds": "interruption.message.latency.time_seconds"
},
{
"karpenter_nodepools_usage": "nodepool_usage"
},
{
"karpenter_nodepools_limit": "nodepool_limit"
}
]
}
]
}
}
If I'm using the helm chart, where does this code go? Is it under the agents section of the chart?
So far I have not been able to get this working.
If I'm using the helm chart, where does this code go? Is it under the agents section of the chart?
So far I have not been able to get this working.
It goes under podAnnotations
, like:
podAnnotations:
ad.datadoghq.com/controller.checks: |
{
"karpenter": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:%%port_1%%/metrics",
"extra_metrics": [
{
"karpenter_nodes_termination_duration_seconds": "nodes.termination.time_seconds"
},
{
"karpenter_pods_startup_duration_seconds": "pods.startup.time_seconds"
},
...
Karpenter's 1.0.0 release renames several metrics. After upgrading to 1.0.0, new data points for the previously reported metrics are no longer accessible in Datadog.
Steps to reproduce the issue:
Describe the results you received:
Several metrics are no longer reported
Describe the results you expected:
Metrics continue to report (or continue to report following a datadog-agent upgrade)
Additional information you deem important (e.g. issue happens only occasionally):
I can submit a PR to modify the integration, but I am not sure if there's an existing convention for renaming both the input and output metric names, or just the input (to maintain continuity with pre-existing monitors, dashboards, etc). I'll gladly submit a PR once guidance is provided.