aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.84k stars 963 forks source link

Datadog + Karpenter metrics #2664

Closed ishworg closed 2 years ago

ishworg commented 2 years ago

Is an existing page relevant?

No response

What karpenter features are relevant?

Prometheus metrics endpoint

How should the docs be improved?

An example of configuration that shows Datadog agent with Karpenter pods exposing Prometheus metrics would be awesome. Inclusion of configuration/s needs to be set up from Karpenter pods and the same on Datadog agent as well.

Community Note

Please react 👍🏼 if others feel this information is valuable.

bwagner5 commented 2 years ago

We do have a prometheus metrics endpoint with docs here: https://karpenter.sh/v0.18.0/tasks/metrics/

The Karpenter team doesn't really have the expertise with DataDog to document this accurately. We're also trying to reduce the surface area our docs are covering so that we can accurately maintain the docs we have. For example, Terraform docs are a little out-of-hand since we're not experts in Terraform and that ecosystem moves fast in terms of module upgrades.

ishworg commented 2 years ago

@bwagner5 fair enough. I needed to solve this for our usage. Provided is a working values file for Karpenter v0.16.2 (with respect to being scraped by Datadog agen via Datadog AD v1). Only interested in ^karpenter.* metrics.

# Datadog AutoDiscovery v1 annotation injected to every Karpenter pod so Datadog can discover Karpenter's Openmetrics
# endpoint (prometheus-exposition format).
podAnnotations:
  ad.datadoghq.com/controller.check_names: '["openmetrics"]'
  ad.datadoghq.com/controller.init_configs: '[{}]'
  # https://docs.datadoghq.com/agent/guide/template_variables/
  ad.datadoghq.com/controller.instances: '[{ "openmetrics_endpoint":"http://%%host%%:8080/metrics","namespace":"karpenter","metrics":["^karpenter.*"] }]'
  ad.datadoghq.com/webhook.check_names: '["openmetrics"]'
  ad.datadoghq.com/webhook.init_configs: '[{}]'
  # https://docs.datadoghq.com/agent/guide/template_variables/
  ad.datadoghq.com/webhook.instances: '[{ "openmetrics_endpoint":"http://%%host%%:8080/metrics","namespace":"karpenter","metrics":["^karpenter.*"] }]'
hybby commented 1 year ago

Thank you for the above example, @ishworg ! It was helpful to me getting this working in our environment.

Correct me if I'm wrong, but the distinction between webhook and controller checks is unneeded and just means that metrics will be collected twice, right? The ad.datadoghq.com/ annotation specifies that the portion after the slash is just an arbitrary identifier from what I can see.

The metric scraping worked for me with just the following:

podAnnotations:
  ad.datadoghq.com/controller.check_names: '["openmetrics"]'
  ad.datadoghq.com/controller.init_configs: '[{}]'
  ad.datadoghq.com/controller.instances: |
    [
      {
        "openmetrics_endpoint": "http://%%host%%:8080/metrics",
         "namespace": "karpenter",
         "metrics": ["^karpenter.*"]
      }
    ]
kcadorin-alpha commented 1 month ago

For newer DD agents v7.36+ you must use this annotation.

podAnnotations:
  ad.datadoghq.com/controller.checks: |
    {
      "karpenter": {
        "init_config": {},
        "instances": [
          {
            "openmetrics_endpoint": "http://%%host%%:8000/metrics"
          }
        ]
      }
    }