DataDog / integrations-core

Core integrations of the Datadog Agent
BSD 3-Clause "New" or "Revised" License
921 stars 1.39k forks source link

Can't enable DB monitoring collect_schemas feature: job "database-metadata" crushing. #16498

Open se-ipsip opened 9 months ago

se-ipsip commented 9 months ago

Output of the info page

Getting the status from the agent.

===============
Agent (v7.49.1)
===============

  Status date: 2023-12-27 11:08:42.991 UTC (1703675322991)
  Agent start: 2023-12-27 11:07:33.663 UTC (1703675253663)
  Pid: 1
  Go Version: go1.20.10
  Python Version: 3.9.18
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System time: 2023-12-27 11:08:42.991 UTC (1703675322991)

  Host Info
  =========
    bootTime: 2023-12-20 09:54:14 UTC (1703066054000)
    hostId: <redacted>
    kernelArch: x86_64
    kernelVersion: 6.1.58+
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 23.04
    procs: 4
    uptime: 169h14m20s
    virtualizationRole: guest

  Hostnames
  =========
<redacted>

  Metadata
  ========

=========
Collector
=========

  Running Checks
  ==============

    postgres (15.1.1)
    -----------------
      Instance ID: postgres:5e98737379db6dc5 [OK]
      Configuration Source: kube_services:kube_service://datadog/datadog-cloudsql-proxy
      Total Runs: 3
      Metric Samples: Last Run: 307, Total: 814
      Events: Last Run: 0, Total: 0
      Database Monitoring Activity Samples: Last Run: 1, Total: 3
      Database Monitoring Metadata Samples: Last Run: 1, Total: 3
      Database Monitoring Query Metrics: Last Run: 1, Total: 3
      Database Monitoring Query Samples: Last Run: 27, Total: 62
      Service Checks: Last Run: 1, Total: 3
      Average Execution Time : 328ms
      Last Execution Date : 2023-12-27 11:08:31 UTC (1703675311000)
      Last Successful Execution Date : 2023-12-27 11:08:31 UTC (1703675311000)

      Instance ID: postgres:73bd4a61bbd12aab [OK]
      Configuration Source: kube_services:kube_service://datadog/datadog-cloudsql-proxy
      Total Runs: 2
      Metric Samples: Last Run: 350, Total: 468
      Events: Last Run: 0, Total: 0
      Database Monitoring Activity Samples: Last Run: 2, Total: 2
      Database Monitoring Metadata Samples: Last Run: 1, Total: 3
      Database Monitoring Query Metrics: Last Run: 2, Total: 2
      Database Monitoring Query Samples: Last Run: 9, Total: 9
      Service Checks: Last Run: 1, Total: 2
      Average Execution Time : 118ms
      Last Execution Date : 2023-12-27 11:08:23 UTC (1703675303000)
      Last Successful Execution Date : 2023-12-27 11:08:23 UTC (1703675303000)

==========
Aggregator
==========
  Checks Metric Sample: 1,450
  Dogstatsd Metric Sample: 1
  Event: 1
  Events Flushed: 1
  Number Of Flushes: 3
  Series Flushed: 313
  Service Check: 5
  Service Checks Flushed: 6
  Database Monitoring Activity Samples: 7
  Database Monitoring Metadata Samples: 7
  Database Monitoring Query Metrics: 6
  Database Monitoring Query Samples: 96
==========
Endpoints
==========
  https://app.datadoghq.eu - API Key ending with:
      - <redacted>

=====================
Datadog Cluster Agent
=====================

  - Datadog Cluster Agent endpoint detected: https://<redacted>:5005
  Successfully connected to the Datadog Cluster Agent.
  - Running: 7.49.1+commit.1790cab

=============
Autodiscovery
=============
  Enabled Features
  ================
    kubernetes

Additional environment details (Operating System, Cloud provider, etc): GCP CloudSQL Postgres DD running on GKE autopilot, deployed with Helm PGSQL 14 Datadog connection via CloudSQL Proxy

Steps to reproduce the issue:

  1. Configue for Database Monitoring
     {
        "postgres": {
          "init_config": {},
          "instances": [
            {
              "host": "datadog-cloudsql-proxy",
              "reported_hostname": "gcpsql-01",
              "port": 5432,
              "dbstrict": true,
              "username": "<username>",
              "dbname": "db01",
              "dbm": true,
              "relations": [
                "relation_regex: .*"
              ]
            }
          ]
        }
      } 
  2. Enable schema collection feature (https://docs.datadoghq.com/database_monitoring/setup_postgres/gcsql/?tab=kubernetes#collecting-schemas) Add into config
    "collect_schemas": { "enabled": true }
  3. Can observe following error stacktrace in datadog-agent-clusterchecks container:
    2023-12-27 10:56:14 UTC | CORE | ERROR | (pkg/collector/python/datadog_agent.go:129 in LogMessage) | postgres:8f8c6091ed5a240d | (utils.py:327) | [kube_service:datadog-cloudsql-proxy,env:demo,service:postgres,kube_namespace:datadog,cluster_name:***, kube_cluster_name:***,server:datadog-cloudsql-proxy,port:5432,db:postgres,dd.internal.resource:database_instance:gcpsql-01,job:database-metadata] Job loop crash
    Traceback (most recent call last):
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/db/utils.py", line 305, in _job_loop
    self._run_job_rate_limited()
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/db/utils.py", line 344, in _run_job_rate_limited
    self._run_job_traced()
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/db/utils.py", line 350, in _run_job_traced
    return self.run_job()
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 216, in run_job
    self.report_postgres_metadata()
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/base/utils/tracking.py", line 71, in wrapper
    result = function(self, *args, **kwargs)
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 242, in report_postgres_metadata
    metadata = self._collect_schema_info()
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 274, in _collect_schema_info
    metadata.append(self._collect_metadata_for_database(database))
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 461, in _collect_metadata_for_database
    tables_info = self._query_table_information_for_schema(cursor, schema['id'], dbname)
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 396, in _query_table_information_for_schema
    tables_info = self._get_table_info(cursor, dbname, schema_id)
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 324, in _get_table_info
    table_info = self._filter_tables_with_no_relation_metrics(dbname, table_info)
    File "/opt/datadog-agent/embedded/lib/python3.9/site-packages/datadog_checks/postgres/metadata.py", line 338, in _filter_tables_with_no_relation_metrics
    if table['name'] in cache[dbname].keys():
    KeyError: 'db01'

Describe the results you received: Schema(BETA) tab in APM DB Monitoring page still shows empty.

Describe the results you expected: Schema information collected on APM DB monitoring page

edjshelton commented 1 month ago

I believe

"relations": [
  "relation_regex: .*"
]

should be

"relations": [
  {
    "relation_regex": ".*"
  }
]