DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.83k stars 1.19k forks source link

[BUG] postgres configuration item collect_wal_metrics doesn't prevent attempts to collect metrics if `false` #23446

Closed estokes-vs closed 5 months ago

estokes-vs commented 6 months ago

We updated our datadog-agent from version 7.44.1 to 7.50.1 a while back, and since that update we are getting error exceptions like below. Some background context, our agent connects to several Postgres instances which are maintained by a Managed Service Provider, so we have restricted access to some settings due to their security posture. The function pg_ls_waldir is one of those configuration items.

We have never enabled collect_wal_metrics in the past, and even tried setting it to collect_wal_metrics: false in the conf.yaml file, and we are still getting these exceptions in the logs due to the Agent trying to access this function. We are likely not the only customers who are using databases hosted by a Managed Service Provider with a strict security posture, so it would make sense to allow us to disable this feature.

It's understandable in the documentation it requests GRANT pg_monitor to <DATADOG_USER> however we are not able to grant this as it contains pg_read_all_settings which exposes sensitive platform information. We are however able to grant pg_read_all_stats,pg_stat_scan_tables, but it is likely that other MSPs to not allow pg_read_all_settings either due to security concerns.

postgres:40afb37e3790221a | (postgres.py:239) | Unhandled exception while using database connection core
Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/postgres/postgres.py", line 224, in db
    yield self._db
  File "/opt/datadog-agent/embedded/lib/python3.11/site-packages/datadog_checks/postgres/postgres.py", line 207, in execute_query_raw
    cursor.execute(query)
psycopg2.errors.InsufficientPrivilege: permission denied for function pg_ls_waldir

Agent Environment 7.44.1 - Did not encounter this issue 7.50.1 - Began running into this issue

Describe what happened: Updated the agent from 7.44.1 to 7.50.1 and began getting many errors shown above.

Describe what you expected: If collect_wal_metrics is supposed to default to false, or is hardcoded to false, the Agent should not be trying to collect these WAL metrics and should not try to access the pg_ls_waldir function.

Steps to reproduce the issue: Disable the agent's ability to access pg_ls_waldir by REVOKE pg_read_all_stettings or instead of GRANT pg_monitor you could GRANT pg_read_all_stats,pg_stat_scan_tables and then run the agent configured with collect_wal_metrics: false.

Additional environment details (Operating System, Cloud provider, etc): Linux, Aiven's Managed Service Platform.

ehamberg commented 5 months ago

Still present in v7.52.0. Seeing this on DigitalOcean's managed postgresql clusters.

lu-zhengda commented 5 months ago

Thanks for submitting the issue. This bug is fixed in PR https://github.com/DataDog/integrations-core/pull/16990 and will be released in datadog-agent v7.53.0.

ashkilniuk commented 4 months ago

still present in 7.53.0

kteague-tasktop commented 3 months ago

still present in 7.54.0

pscheit commented 3 months ago

@lu-zhengda any chance this didnt make the release? Can't find the changelog entry from the merged PR in the log here? https://github.com/DataDog/datadog-agent/blob/main/CHANGELOG.rst

edit: oh don't mind me, found it in the integration changelog.

Upgrading to 7.54.x and setting collect_wal_metrics did solve it

gpatounas commented 1 week ago

collect_wal_metrics default is documented as false in https://github.com/DataDog/integrations-core/blob/master/postgres/datadog_checks/postgres/data/conf.yaml.example.

This does not appear to be the case, to disable it explicitly setting it to false is required.