DataDog / integrations-core

Core integrations of the Datadog Agent
BSD 3-Clause "New" or "Revised" License
914 stars 1.39k forks source link

postgres: wrong db tags since 7.48.0 #16111

Open sileht opened 10 months ago

sileht commented 10 months ago

Additional environment details (Operating System, Cloud provider, etc):

Heroku

Steps to reproduce the issue:

Our Postgres server has two databases, db_one and db_two.

Here is our Postgres configuration file, to monitor db_one:

init_config:

instances:
  - host: xxxxxx
    port: 5432
    username: xxxx
    password: xxxx
    dbname: db_one
    dbm: true
    ssl: allow
    query_samples:
      explain_parameterized_queries: false
    custom_queries:
      - metric_prefix: xxxx.db
        query: "SELECT count(*) FROM \"user\""
        columns:
          - name: users.count

Describe the results you received:

Since 7.48.0, our xxx.db.users.count is alternatively tagged withdb: db_one, then db: db_two,db_one, ...

Fallback to 7.47.1 solves the issue.

Describe the results you expected:

Since this is a custom_queries and dbname: is set we expected all our db tags of all custom_queries to be onlydb: db_one

Additional information you deem important (e.g. issue happens only occasionally):

The coma inside the tag looks suspicious.

Getting the status from the agent.

===============
Agent (v7.48.1)
===============

  Status date: 2023-10-30 12:34:43.646 UTC (1698669283646)
  Agent start: 2023-10-30 11:25:03.097 UTC (1698665103097)
  Pid: 59
  Go Version: go1.20.8
  Python Version: 3.9.18
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: error

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -823µs
    System time: 2023-10-30 12:34:43.646 UTC (1698669283646)

  Host Info
  =========
    bootTime: 2023-10-06 14:46:43 UTC (1696603603000)
    hostId: xxxxxx
    kernelArch: x86_64
    kernelVersion: 4.4.0-1104-aws
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: 12.2
    procs: 8
    uptime: 572h38m31s

  Hostnames
  =========
    host_aliases: [xxxxxxxx]
    hostname: xxxxx
    socket-fqdn: xxxxx
    socket-hostname: xxxxxxx
    host tags:
      appname:xxxxx
      dyno:xxxxx.1
      dynotype:xxxxx
      env:prod
      service:xxxxx
    hostname provider: configuration

  Metadata
  ========
    agent_version: 7.48.1
    config_apm_dd_url:

    config_dd_url:
    config_logs_dd_url:
    config_logs_socks5_proxy_address:
    config_no_proxy: [169.254.169.254 100.100.100.200]
    config_process_dd_url:
    config_proxy_http:
    config_proxy_https:
    config_site:
    feature_apm_enabled: true
    feature_cspm_enabled: false
    feature_cws_enabled: false
    feature_cws_network_enabled: true
    feature_cws_remote_config_enabled: false
    feature_cws_security_profiles_enabled: false
    feature_dynamic_instrumentation_enabled: false
    feature_fips_enabled: false
    feature_imdsv2_enabled: false
    feature_logs_enabled: true
    feature_networks_enabled: false
    feature_networks_http_enabled: false
    feature_networks_https_enabled: false
    feature_oom_kill_enabled: false
    feature_otlp_enabled: false
    feature_process_enabled: true
    feature_process_language_detection_enabled: false
    feature_processes_container_enabled: false
    feature_remote_configuration_enabled: true
    feature_tcp_queue_length_enabled: false
    feature_usm_enabled: false
    feature_usm_go_tls_enabled: false
    feature_usm_http2_enabled: false
    feature_usm_http_by_status_code_enabled: false
    feature_usm_istio_enabled: false
    feature_usm_java_tls_enabled: false
    feature_usm_kafka_enabled: false
    flavor: agent
    hostname_source: configuration
    install_method_installer_version: deb_package
    install_method_tool: dpkg
    install_method_tool_version: dpkg-1.21.22
    logs_transport: HTTP
    system_probe_core_enabled: true
    system_probe_gateway_lookup_enabled: true
    system_probe_kernel_headers_download_enabled: false
    system_probe_max_connections_per_message: 600
    system_probe_prebuilt_fallback_enabled: true
    system_probe_protocol_classification_enabled: true
    system_probe_root_namespace_enabled: true
    system_probe_runtime_compilation_enabled: false
    system_probe_telemetry_enabled: true
    system_probe_track_tcp_4_connections: true
    system_probe_track_tcp_6_connections: true
    system_probe_track_udp_4_connections: true
    system_probe_track_udp_6_connections: true

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 278
      Metric Samples: Last Run: 9, Total: 2,495
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-10-30 12:34:36 UTC (1698669276000)
      Last Successful Execution Date : 2023-10-30 12:34:36 UTC (1698669276000)

    disk (5.0.0)
    ------------
      Instance ID: disk:67cc0574430a16ba [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 279
      Metric Samples: Last Run: 290, Total: 80,910
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 14ms
      Last Execution Date : 2023-10-30 12:34:43 UTC (1698669283000)
      Last Successful Execution Date : 2023-10-30 12:34:43 UTC (1698669283000)

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 278
      Metric Samples: Last Run: 5, Total: 1,390
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-10-30 12:34:35 UTC (1698669275000)
      Last Successful Execution Date : 2023-10-30 12:34:35 UTC (1698669275000)

    github_deployment_runtime (unversioned)
    ---------------------------------------
      Instance ID: github_deployment_runtime:b2508b7ffd756543 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/github_deployment_runtime.d/conf.yaml
      Total Runs: 70
      Metric Samples: Last Run: 4, Total: 280
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 2.377s
      Last Execution Date : 2023-10-30 12:34:09 UTC (1698669249000)
      Last Successful Execution Date : 2023-10-30 12:34:09 UTC (1698669249000)

    github_marketplace (unversioned)
    --------------------------------
      Instance ID: github_marketplace:5a4420cc278b2d1 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/github_marketplace.d/conf.yaml
      Total Runs: 2
      Metric Samples: Last Run: 1, Total: 2
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 296ms
      Last Execution Date : 2023-10-30 12:25:07 UTC (1698668707000)
      Last Successful Execution Date : 2023-10-30 12:25:07 UTC (1698668707000)

    github_security_alerts (unversioned)
    ------------------------------------
      Instance ID: github_security_alerts:86f577a57ed652e [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/github_security_alerts.d/conf.yaml
      Total Runs: 1
      Metric Samples: Last Run: 148, Total: 148
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1.506s
      Last Execution Date : 2023-10-30 11:55:07 UTC (1698666907000)
      Last Successful Execution Date : 2023-10-30 11:55:07 UTC (1698666907000)

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 279
      Metric Samples: Last Run: 223, Total: 62,064
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-10-30 12:34:42 UTC (1698669282000)
      Last Successful Execution Date : 2023-10-30 12:34:42 UTC (1698669282000)

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 278
      Metric Samples: Last Run: 6, Total: 1,668
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-10-30 12:34:34 UTC (1698669274000)
      Last Successful Execution Date : 2023-10-30 12:34:34 UTC (1698669274000)

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 279
      Metric Samples: Last Run: 20, Total: 5,580
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-10-30 12:34:41 UTC (1698669281000)
      Last Successful Execution Date : 2023-10-30 12:34:41 UTC (1698669281000)

    network (3.0.0)
    ---------------
      Instance ID: network:4b0649b7e11f0772 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 278
      Metric Samples: Last Run: 78, Total: 21,684
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 2ms
      Last Execution Date : 2023-10-30 12:34:33 UTC (1698669273000)
      Last Successful Execution Date : 2023-10-30 12:34:33 UTC (1698669273000)

    ntp
    ---
      Instance ID: ntp:3c427a42a70bbf8 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 5
      Metric Samples: Last Run: 1, Total: 5
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 5
      Average Execution Time : 2.105s
      Last Execution Date : 2023-10-30 12:25:11 UTC (1698668711000)
      Last Successful Execution Date : 2023-10-30 12:25:11 UTC (1698668711000)

    postgres (14.4.0)
    -----------------
      Instance ID: postgres:f6e8fc3cccfe6c8e [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/postgres.d/conf.yaml
      Total Runs: 279
      Metric Samples: Last Run: 782, Total: 220,853
      Events: Last Run: 0, Total: 0
      Database Monitoring Activity Samples: Last Run: 1, Total: 415
      Database Monitoring Metadata Samples: Last Run: 1, Total: 18
      Database Monitoring Query Metrics: Last Run: 2, Total: 417
      Database Monitoring Query Samples: Last Run: 13, Total: 2,014
      Service Checks: Last Run: 1, Total: 279
      Average Execution Time : 225ms
      Last Execution Date : 2023-10-30 12:34:38 UTC (1698669278000)
      Last Successful Execution Date : 2023-10-30 12:34:38 UTC (1698669278000)
      metadata:
        resolved_hostname: xxxxxxx
        version.major: 15
        version.minor: 3
        version.patch: 0
        version.raw: 15.3
        version.scheme: semver

    process (3.0.0)
    ---------------
      Instance ID: process:xxxxx:def707fdc0059710 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/process.d/conf.yaml
      Total Runs: 278
      Metric Samples: Last Run: 24, Total: 6,670
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 278
      Average Execution Time : 1ms
      Last Execution Date : 2023-10-30 12:34:30 UTC (1698669270000)
      Last Successful Execution Date : 2023-10-30 12:34:30 UTC (1698669270000)

    redisdb (5.1.0)
    ---------------
      Instance ID: redisdb:451b352e23301d38 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/redisdb.d/conf.yaml
      Total Runs: 278
      Metric Samples: Last Run: 62, Total: 17,241
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 278
      Average Execution Time : 5ms
      Last Execution Date : 2023-10-30 12:34:37 UTC (1698669277000)
      Last Successful Execution Date : 2023-10-30 12:34:37 UTC (1698669277000)
      metadata:
        version.major: 6
        version.minor: 2
        version.patch: 13
        version.raw: 6.2.13
        version.scheme: semver

      Instance ID: redisdb:b8fb9147e848e6c3 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/redisdb.d/conf.yaml
      Total Runs: 278
      Metric Samples: Last Run: 63, Total: 17,519
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 278
      Average Execution Time : 28ms
      Last Execution Date : 2023-10-30 12:34:29 UTC (1698669269000)
      Last Successful Execution Date : 2023-10-30 12:34:29 UTC (1698669269000)
      metadata:
        version.major: 6
        version.minor: 2
        version.patch: 13
        version.raw: 6.2.13
        version.scheme: semver

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 279
      Metric Samples: Last Run: 1, Total: 279
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2023-10-30 12:34:40 UTC (1698669280000)
      Last Successful Execution Date : 2023-10-30 12:34:40 UTC (1698669280000)

========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    CustomResource: 0
    CustomResourceDefinition: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    HorizontalPodAutoscaler: 0
    Ingress: 0
    Job: 0
    Namespace: 0
    Node: 0
    OrchestratorManifest: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0
    VerticalPodAutoscaler: 0

  Transaction Successes
  =====================
    Total number: 588
    Successes By Endpoint:
      check_run_v1: 278
      intake: 25
      metadata_v1: 7
      series_v2: 278

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

  API Keys status
  ===============
    API key ending with xxxxx: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - xxxxx

==========
Logs Agent
==========
    Reliable: Sending compressed logs in HTTPS to agent-http-intake.logs.datadoghq.com on port 443
    BytesSent: 4.112809e+07
    EncodedBytesSent: 5.962278e+06
    LogsProcessed: 2573
    LogsSent: 5437
    CoreAgentProcessOpenFiles: 36
    OSFileLimit: 10000

  ============
  Integrations
  ============

  xxxxxxx
  ----------------
    - Type: udp
      Port: 10518
      Service: xxxxx
      Source: python
      Status: OK

      Bytes Read: 1946308
      Pipeline Latency:
        Average Latency (ms): 0
        24h Average Latency (ms): 0
        Peak Latency (ms): 10
        24h Peak Latency (ms): 10

=============
Process Agent
=============

  Version: 7.48.1
  Status date: 2023-10-30 12:34:48.552 UTC (1698669288552)
  Process Agent Start: 2023-10-30 11:25:02.584 UTC (1698665102584)
  Pid: 61
  Go Version: go1.20.8
  Build arch: amd64
  Log Level: error
  Enabled Checks: [process rtprocess]
  Allocated Memory: 16,323,872 bytes
  Hostname: xxxxxxx
  System Probe Process Module Status: Not running
  Process Language Detection Enabled: False

  =================
  Process Endpoints
  =================
    https://process.datadoghq.com - API Key ending with:
        - xxxxx

  =========
  Collector
  =========
    Last collection time: 2023-10-30 12:34:45
    Docker socket:
    Number of processes: 8
    Number of containers: 0
    Process Queue length: 0
    RTProcess Queue length: 0
    Connections Queue length: 0
    Event Queue length: 0
    Pod Queue length: 0
    Process Bytes enqueued: 0
    RTProcess Bytes enqueued: 0
    Connections Bytes enqueued: 0
    Event Bytes enqueued: 0
    Pod Bytes enqueued: 0
    Drop Check Payloads: []

=========
APM Agent
=========
  Status: Running
  Pid: 60
  Uptime: 4186 seconds
  Mem alloc: 20,705,048 bytes
  Hostname: xxxxxxx
  Receiver: localhost:8126
  Endpoints:
    https://trace.agent.datadoghq.com

  Receiver (previous minute)
  ==========================
    From python 3.11.5 (CPython), client 2.1.3
      Traces received: 72 (263,898 bytes)
      Spans received: 360

    Priority sampling rate for 'service:xxxxxx,env:prod': 100.0%
    Priority sampling rate for 'service:yyyyyy,env:prod': 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

==========
Aggregator
==========
  Checks Metric Sample: 445,983
  Dogstatsd Metric Sample: 109,587
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 278
  Series Flushed: 422,009
  Service Check: 1,118
  Service Checks Flushed: 1,394
  Database Monitoring Activity Samples: 416
  Database Monitoring Metadata Samples: 18
  Database Monitoring Query Metrics: 417
  Database Monitoring Query Samples: 2,014

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 109,586
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 12,638,519
  Udp Packet Reading Errors: 0
  Udp Packets: 57,811
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0
  Unterminated Metric Errors: 0

====================
Remote Configuration
====================

    Organization enabled: True
    API Key: Authorized
    Last error: None

====
OTLP
====

  Status: Not enabled
  Collector status: Not running
pkatiushyn commented 10 months ago

btw, the same problem happens with MySQL

jmeunier28 commented 10 months ago

Hi @sileht 👋 FYI, I fixed this in our backend so it should be safe to upgrade to 7.48 again for the postgres integration. This was the result of a bug introduced in 7.48, which will be fixed in the agent in the 7.50 release of the postgres integration. Sorry for the inconvenience, and thank you for reporting the bug to us!

@pkatiushyn for MySQL, what tag do you see being duplicated? Can you provide more details?

pkatiushyn commented 10 months ago

The problem with mysql check appeared also after 7.48. Here is the check config:

---
instances:
- host: d01.xxxxxxx.us-west-2.rds.amazonaws.com
  username: dbuser
  password: dbpass
  tags:
    - dbclusteridentifier:cluster1
    - dbinstanceidentifier:d01
  options:
    replication: 1
    extra_status_metrics: 1
    extra_innodb_metrics: 1
- host: d00.xxxxxxx.us-west-2.rds.amazonaws.com
  username: dbuser
  password: dbpass
  tags:
    - dbclusteridentifier:cluster1
    - dbinstanceidentifier:d00
  options:
    replication: 1
    extra_status_metrics: 1
    extra_innodb_metrics: 1

init_config: {}
logs: []

Then looking at mysql.performance.queries metric and grouping by dbinstanceidentifier, I see the following groups:

dbinstanceidentifier in avg:mysql.performance.queries{dbclusteridentifier:cluster1}
d00
d00,d01
d01

This weird d00,d01 in dbinstanceidenfier is the wrong one.