DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.83k stars 1.19k forks source link

Supplying `host` tag when submitting a metric results in `DD_TAGS` specified tags to not be attached #6245

Open millerick opened 4 years ago

millerick commented 4 years ago

Output of the info page (if this is a bug)

Getting the status from the agent.

===============
Agent (v7.17.0)
===============

  Status date: 2020-08-14 23:14:47.164554 UTC
  Agent start: 2020-08-05 23:14:40.003844 UTC
  Pid: 352
  Go Version: go1.12.9
  Python Version: 3.7.6
  Build arch: amd64
  Check Runners: 4
  Log Level: debug

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: 1.022ms
    System UTC time: 2020-08-14 23:14:47.164554 UTC

  Host Info
  =========
    bootTime: 2020-05-02 03:04:29.000000 UTC
    kernelVersion: 4.4.0-1101-aws
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: bullseye/sid
    procs: 70
    uptime: 2300h10m16s
    virtualizationRole: guest
    virtualizationSystem: xen

  Hostnames
  =========
    ec2-hostname: ip-10-10-9-62.ec2.internal
    host_aliases: [ip-10-10-9-62.ec2.internal]
    hostname: ip-10-10-9-62.ec2.internal
    instance-id: i-0418b6cae4b076697
    socket-fqdn: dd-agent-22tm6
    socket-hostname: dd-agent-22tm6
    host tags:
      cluster_type:kubernetes
      kubernetescluster:sandbox.k8s.centrio.com
      cluster_name:sandbox.k8s.centrio.com
      environment:sandbox
      kube_node_role:node
      kube_node_role:core-node
    hostname provider: configuration

  Metadata
  ========
    cloud_provider: AWS
    hostname_source: configuration

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 6, Total: 311,034
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    disk (2.6.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 284, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1.522s

    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 512, Total: 1 M
      Events: Last Run: 0, Total: 42
      Service Checks: Last Run: 1, Total: 51,840
      Average Execution Time : 79ms

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 5, Total: 259,200
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    fluentd (1.5.0)
    ---------------
      Instance ID: fluentd:a7ad8395a235730 [OK]
      Configuration Source: kubelet:docker://128d1e53aeee0ece0469522627b0bea5c010e3bf5d21c64ee1187a18107d4971
      Total Runs: 23,765
      Metric Samples: Last Run: 5, Total: 118,825
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 23,765
      Average Execution Time : 61ms

    http_check (4.6.3)
    ------------------
      Instance ID: http_check:etcd:555f04d8819efe8d [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/http_check.yaml
      Total Runs: 51,840
      Metric Samples: Last Run: 5, Total: 259,200
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 2, Total: 103,680
      Average Execution Time : 144ms

      Instance ID: http_check:registry:70f468b5e2e82a72 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/http_check.yaml
      Total Runs: 51,841
      Metric Samples: Last Run: 5, Total: 259,205
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 2, Total: 103,682
      Average Execution Time : 212ms

      Instance ID: http_check:vault:78f6fdb97c23bf67 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/http_check.yaml
      Total Runs: 51,840
      Metric Samples: Last Run: 5, Total: 259,200
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 2, Total: 103,680
      Average Execution Time : 214ms

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 104, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1ms

    kubelet (3.5.1)
    ---------------
      Instance ID: kubelet:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 723, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 207,360
      Average Execution Time : 1.62s

    kubernetes_state (5.1.0)
    ------------------------
      Instance ID: kubernetes_state:da1c302ce5bb9929 [OK]
      Configuration Source: kubelet:docker://2cc52cf536041fe466bcc9b641af9969c6f30a5f043f89329edc08294bbda95f
      Total Runs: 5,587
      Metric Samples: Last Run: 79,593, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 3,332, Total: 1 M
      Average Execution Time : 11.857s

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 6, Total: 311,040
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 17, Total: 881,280
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    network (1.14.0)
    ----------------
      Instance ID: network:e0204ad63d43c949 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 51,840
      Metric Samples: Last Run: 31, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 277ms

    nginx (3.6.0)
    -------------
      Instance ID: nginx:bc19242fd3bdbe8d [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/nginx.yaml
      Total Runs: 51,840
      Metric Samples: Last Run: 7, Total: 362,873
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 51,840
      Average Execution Time : 74ms
      metadata:
        version.major: 1
        version.minor: 15
        version.patch: 8
        version.raw: 1.15.8
        version.scheme: semver

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 865
      Metric Samples: Last Run: 1, Total: 865
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 865
      Average Execution Time : 692ms

    openmetrics (1.4.0)
    -------------------
      Instance ID: openmetrics:kubernetes_state:2818390c49ff5749 [WARNING]
      Configuration Source: kubelet:docker://2cc52cf536041fe466bcc9b641af9969c6f30a5f043f89329edc08294bbda95f
      Total Runs: 5,886
      Metric Samples: Last Run: 2,000, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 5,886
      Average Execution Time : 6.844s

      Warning: Check openmetrics exceeded limit of 2000 metrics, ignoring next ones

      Instance ID: openmetrics:node_exporter:5381bacdc75f9a42 [OK]
      Configuration Source: kubelet:docker://bb871b80035f426136eac1eecbc3df330d6c0026aa69c4628d2efd15b809fbff
      Total Runs: 51,840
      Metric Samples: Last Run: 6, Total: 311,040
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 51,840
      Average Execution Time : 231ms

    prometheus (3.2.0)
    ------------------
      Instance ID: prometheus:calico:6e2d440ca6b49c58 [OK]
      Configuration Source: kubelet:docker://5cf397ef5e81f87cbc0fc3736d633b98550c3e0916850b322afef0bd61b081a9
      Total Runs: 51,841
      Metric Samples: Last Run: 142, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 51,841
      Average Execution Time : 81ms

    tls (1.4.0)
    -----------
      Instance ID: tls:fb78d1d72f83021 [OK]
      Configuration Source: kubelet:docker://c745d8c7865cc5a83cbdacb66fd734c3cbeba9a9667b9672435d4250bcb3af59
      Total Runs: 51,840
      Metric Samples: Last Run: 2, Total: 103,680
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 207,360
      Average Execution Time : 110ms

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 51,841
      Metric Samples: Last Run: 1, Total: 51,841
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

========
JMXFetch
========

  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 51,840
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 5,643
    Metadata: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    SketchSeries: 0
    Success: 715,431
    TimeseriesV1: 51,840

  API Keys status
  ===============
    API key ending with 05c5f: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 05c5f

==========
Logs Agent
==========

  Logs Agent is not running

=========
Aggregator
=========
  Checks Metric Sample: 3.5 G
  Dogstatsd Metric Sample: 21.4 M
  Event: 43
  Events Flushed: 43
  Number Of Flushes: 51,840
  Series Flushed: 3.5 G
  Service Check: 131.8 M
  Service Checks Flushed: 131.9 M

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 21.4 M
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 490.6 M
  Udp Packet Reading Errors: 0
  Udp Packets: 6.1 M
  Uds Bytes: 1.1 G
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 3 M

=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://10.101.17.200:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 1.4.0+commit.f102bd8

Describe what happened: We tried supplying the host tag with a metric to suppress the agent from adding its own host information, since the host information is not relevant for some of our metrics. When we did this, other tags that are specified using DD_TAGS environment variable were not attached to the metric, but our manually specified host tag was.

Describe what you expected: Global tags specified by DD_TAGS should always be attached to a metric, regardless of what other tags are submitted. Tag values that are explicitly submitted with a metric should take precedence over the values specified in DD_TAGS.

Steps to reproduce the issue:

  1. Configure the agent to attach environment:sandbox using DD_TAGS
  2. Submit host:foobar as a tag along with a metric
  3. Observe that host:foobar tag appears in datadog, but environment:sandbox tag does not. If host:foobar is not submitted as a tag on the metric, observe that the environment:sandbox tag will appear.

Additional environment details (Operating System, Cloud provider, etc): I believe enough information is specified in the agent information page. Let me know if more information would be useful.

Simwar commented 4 years ago

Hey @millerick

Thanks for reaching out! There is a simple explanation to this behavior: Host tags are not sent along with metrics, logs nor traces. They are sent once, when the agent starts, along other host related data like the hostname. Host tags are then reconciled on our backend thanks to... the hostname!

So, if you change the host tag on one of your custom metric or check metric, the backend will not be able to find the right host and thus, the right tags.

I hope it makes sense!

Let us know if you have further questions.

millerick commented 4 years ago

@Simwar , is there a way to omit the addition of the host tag while still having the DD_TAGS tags applied? Context is that the tag cardinality that host tag creates for some metrics is large and is rarely used in a meaningful way for our own observability purposes, but the tags added via DD_TAGS are often times useful and low cardinality.

millerick commented 4 years ago

@Simwar , another question: if the agent makes the call once on startup, how long does the host:DD_TAGS mapping stay in the system? I.e. if we start an agent on a host, and supply that as the host tag for a multitude of metrics, but then that host ceases to exist at some point in the future, will the original DD_TAGS eventually cease to be associated with the metrics?

millerick commented 4 years ago

@Simwar , following up on this question

Is there a way to omit the addition of the host tag while still having the DD_TAGS tags applied? Context is that the tag cardinality that host tag creates for some metrics is large and is rarely used in a meaningful way for our own observability purposes, but the tags added via DD_TAGS are often times useful and low cardinality.