DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.86k stars 1.2k forks source link

kubernetes_state.* metrics being tagged incorrectly with labels from the kube-state-metrics pod #2671

Open mellowplace opened 5 years ago

mellowplace commented 5 years ago

Output of the info page (if this is a bug)

Getting the status from the agent.

==============
Agent (v6.6.0)
==============

  Status date: 2018-11-15 10:48:23.952742 UTC
  Pid: 381
  Python Version: 2.7.15
  Logs: 
  Check Runners: 4
  Log Level: debug

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -31µs
    System UTC time: 2018-11-15 10:48:23.952742 UTC

  Host Info
  =========
    bootTime: 2018-09-12 16:25:21.000000 UTC
    kernelVersion: 4.14.56+
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: buster/sid
    procs: 70
    uptime: 1530h11m56s

  Hostnames
  =========
    host_aliases: [gke-staging-cluster-2-blue-07810bcd-xkb8.xxxx gke-staging-cluster-2-blue-07810bcd-xkb8]
    hostname: gke-staging-cluster-2-blue-07810bcd-xkb8.c.xxxx.internal
    socket-fqdn: dd-agent-nl8b6
    socket-hostname: dd-agent-nl8b6
    hostname provider: gce
    unused hostname providers:
      configuration/environment: hostname is empty

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
        Instance ID: cpu [OK]
        Total Runs: 44
        Metric Samples: 6, Total: 258
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

    disk (1.4.0)
    ------------
        Instance ID: disk:e5dffb8bef24336f [OK]
        Total Runs: 44
        Metric Samples: 214, Total: 9,416
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 81ms

    docker
    ------
        Instance ID: docker [OK]
        Total Runs: 45
        Metric Samples: 562, Total: 25,330
        Events: 0, Total: 0
        Service Checks: 1, Total: 45
        Average Execution Time : 39ms

    file_handle
    -----------
        Instance ID: file_handle [OK]
        Total Runs: 44
        Metric Samples: 5, Total: 220
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

    io
    --
        Instance ID: io [OK]
        Total Runs: 45
        Metric Samples: 143, Total: 6,336
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

    kubelet (2.2.0)
    ---------------
        Instance ID: kubelet:510263978455d418 [OK]
        Total Runs: 44
        Metric Samples: 746, Total: 32,840
        Events: 0, Total: 0
        Service Checks: 3, Total: 132
        Average Execution Time : 379ms

    kubernetes_apiserver
    --------------------
        Instance ID: kubernetes_apiserver [OK]
        Total Runs: 44
        Metric Samples: 0, Total: 0
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

    load
    ----
        Instance ID: load [OK]
        Total Runs: 45
        Metric Samples: 6, Total: 270
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

    memory
    ------
        Instance ID: memory [OK]
        Total Runs: 44
        Metric Samples: 17, Total: 748
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

    network (1.7.0)
    ---------------
        Instance ID: network:2a218184ebe03606 [OK]
        Total Runs: 45
        Metric Samples: 152, Total: 6,852
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 5ms

    ntp
    ---
        Instance ID: ntp:b4579e02d1981c12 [OK]
        Total Runs: 44
        Metric Samples: 1, Total: 44
        Events: 0, Total: 0
        Service Checks: 1, Total: 44
        Average Execution Time : 0s

    uptime
    ------
        Instance ID: uptime [OK]
        Total Runs: 45
        Metric Samples: 1, Total: 45
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

  Config Errors
  ==============
    kubernetes_apiserver
    --------------------
      Configuration file contains no valid instances

========
JMXFetch
========

  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  CheckRunsV1: 44
  Dropped: 0
  DroppedOnInput: 0
  Events: 0
  HostMetadata: 0
  IntakeV1: 5
  Metadata: 0
  Requeued: 0
  Retried: 0
  RetryQueueSize: 0
  Series: 0
  ServiceChecks: 0
  SketchSeries: 0
  Success: 93
  TimeseriesV1: 44

  API Keys status
  ===============
    API key ending with 2c78f on endpoint https://app.datadoghq.com: API Key valid

==========
Logs Agent
==========

  Logs Agent is not running

=========
DogStatsD
=========

  Checks Metric Sample: 83,338
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 44
  Series Flushed: 74,931
  Service Check: 711
  Service Checks Flushed: 748
  Dogstatsd Metric Sample: 110

Describe what happened:

kubernetes_state.* metrics are being double tagged with labels from the actual Pod and with kube-state-metrics labels

Describe what you expected:

For the labels from the actual pod refered to by kube-state-metrics to be tagged agains the metrics.

Steps to reproduce the issue:

kubernetes_state.yaml (this plugin is producing the correct tags)

ad_identifiers:
  - kube-state-metrics

init_config:

instances:
  - kube_state_url: http://kube-state-metrics.default.svc.cluster.local:8080/metrics

    label_joins:
      kube_pod_labels:
        label_to_match: pod
        labels_to_get:
          - label_app
          - label_type
          - label_version

    labels_mapper:
      label_app: kube_label_app
      label_type: kube_label_type
      label_version: kube_label_version

Snippet from kubernetes.yaml (this causes the mistagging with information from the kube-state-metrics pod)

kubernetes_pod_labels_as_tags:
  app: kube_label_app
  type: kube_label_type
  version: kube_label_version

Additional environment details (Operating System, Cloud provider, etc):

Kubernetes

ellieayla commented 5 years ago

This happens because of a collision between the labels you're using for tagging pods-under-monitoring and the kube-state-metrics pods having those tags too. I haven't found a way to get the Datadog agent to specialcase-ignore the labels from kube-state-metrics pods, so instead i have used other non-standard labels (system-app) for the kube-state-metrics deployment & its pods label-selector.

duxing commented 4 years ago

Thanks for sharing this! I was wondering if someone from DataDog can help with the direction on this issue.

duxing commented 4 years ago

@mellowplace this is resolved for me by setting this flag in configuration: https://github.com/DataDog/integrations-core/blob/master/kubernetes_state/datadog_checks/kubernetes_state/data/auto_conf.yaml#L7

related release note you need to set the configuration to true explicitly if you use a configuration yaml. if you're still troubled by this issue you can try upgrading ur datadog chart to latest version (datadog-agent need to be 7.16/6.16 above) and update your config and that should work for you.