Closed kyle-rf closed 1 year ago
Did you run ALTER EXTENSION pgnodemx UPDATE;
as a superuser in each of your applicable databases?
Did you run
ALTER EXTENSION pgnodemx UPDATE;
as a superuser in each of your applicable databases?
Thanks for the suggestion. I tried that but got the following error:
ERROR: extension "pgnodemx" does not exist
I listed the extensions:
\dx+
Objects in extension "pg_stat_statements"
Object description
---------------------------------------------------
function pg_stat_statements(boolean)
function pg_stat_statements_reset(oid,oid,bigint)
view pg_stat_statements
(3 rows)
Objects in extension "pgaudit"
Object description
---------------------------------------
event trigger pgaudit_ddl_command_end
event trigger pgaudit_sql_drop
function pgaudit_ddl_command_end()
function pgaudit_sql_drop()
(4 rows)
Objects in extension "plpgsql"
Object description
-------------------------------------------
function plpgsql_call_handler()
function plpgsql_inline_handler(internal)
function plpgsql_validator(oid)
language plpgsql
@kyle-rf I have been unable to reproduce the behavior you are seeing. Specifically, when I follow the steps you provided (i.e. to upgrade from v4.6.2 to v4.7.4), I see the pgnodemx
extension both before and after the upgrade.
And since the upgrade process does not touch the database itself, the extension should definitely be there both before and after the upgrade (assuming metrics was properly enabled/deployed prior to the upgrade). So just to clarify - per the output you provided above, are you seeing the pgnodemx
before the PGO and PG cluster upgrade to 4.7.4, but not after?
Additionally, I did want to suggest another approach. Specifically, if you remove metrics from a cluster:
$ pgo update cluster mycluster1 --disable-metrics
Adding or removing a metrics collection sidecar can cause downtime.
WARNING: Are you sure? (yes/no): yes
updated pgcluster mycluster1
...and then re-enable metrics, e.g.:
$ pgo update cluster mycluster1 --enable-metrics
Adding or removing a metrics collection sidecar can cause downtime.
WARNING: Are you sure? (yes/no): yes
updated pgcluster mycluster1
This should resolve the error you are seeing by ensuring all of the latest pgMonitor resources are created and/or updated (with the second step specifically ensuring metrics collection [and therefore the pgnodemx
extension] is properly configured and enabled for the cluster).
@andrewlecuyer I gave that a try, but exporter is still logging the error msg="Error running query on database \"127.0.0.1:5432\": ccp_nodemx_data_disk pq: query-specified return tuple and function return type are not compatible"
I have not seen the other error since though, so halfway there.
registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.1.0-0
time="2022-05-13T12:19:42Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_top_total pq: column s.total_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:19:42Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_top_mean pq: column s.mean_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:19:42Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_total pq: column s.total_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:19:42Z" level=info msg="Error running query on database \"localhost:5432\": ccp_nodemx_disk_activity pq: query-specified return tuple and function return type are not compatible" source="postgres_exporter.go:1356"
time="2022-05-13T12:19:42Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_top_max pq: column s.max_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:19:42Z" level=info msg="Error running query on database \"localhost:5432\": ccp_nodemx_data_disk pq: query-specified return tuple and function return type are not compatible" source="postgres_exporter.go:1356"
time="2022-05-13T12:19:42Z" level=error msg="queryNamespaceMappings returned 6 errors" source="postgres_exporter.go:1474"
time="2022-05-13T12:31:12Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_top_mean pq: column s.mean_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:31:12Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_top_total pq: column s.total_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:31:12Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_total pq: column s.total_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:31:12Z" level=info msg="Error running query on database \"localhost:5432\": ccp_pg_stat_statements_top_max pq: column s.max_exec_time does not exist" source="postgres_exporter.go:1356"
time="2022-05-13T12:31:12Z" level=error msg="queryNamespaceMappings returned 4 errors" source="postgres_exporter.go:1474"
Later I found this thread and noticed ALTER EXTENSION pgnodemx UPDATE;
But isn't the operator supposed to do this?
The https://github.com/CrunchyData/postgres-operator/pull/3400 should resolve these issues and is planned to be available in the next release, but not currently planned to be backpatched to v4. As noted, the manual fix should be to update the extensions used by the exporter, namely pgnodemx
and pg_stat_statements
.
@benjaminjb Sorry for writing in a closed issue, but we are seeing similar errors in our operator based openshift installation of crunchy. Operator version 5.2.0. - postgresoperator.v5.2.0.
time="2022-12-08T09:23:56Z" level=info msg="Established new database connection to \"localhost:5432\"." source="postgres_exporter.go:878"
time="2022-12-08T09:23:57Z" level=info msg="Semantic Version Changed on \"localhost:5432\": 0.0.0 -> 14.5.0" source="postgres_exporter.go:1405"
time="2022-12-08T09:24:01Z" level=info msg="Error running query on database \"localhost:5432\": ccp_nodemx_disk_activity pq: query-specified return tuple and function return type are not compatible" source="postgres_exporter.go:1356"
time="2022-12-08T09:24:02Z" level=error msg="queryNamespaceMappings returned 1 errors" source="postgres_exporter.go:1474"
time="2022-12-08T09:24:02Z" level=info msg="Starting Server: :9187" source="postgres_exporter.go:1672"
time="2022-12-08T09:24:03Z" level=info msg="Error running query on database \"localhost:5432\": ccp_nodemx_disk_activity pq: query-specified return tuple and function return type are not compatible" source="postgres_exporter.go:1356"
time="2022-12-08T09:24:03Z" level=error msg="queryNamespaceMappings returned 1 errors" source="postgres_exporter.go:1474"
Should this be fixed in our operator version? I have tried to find out in which operator version the fix is included, but unfortunally i was not able to match the repo tags to an operator version. Maybe you can give me a hint how to match the repo tags to an operator version.
The ALTER EXTENSION pgnodemx UPDATE;
is able to fix the issue, but as mentioned before, this should be done by the operator.
Hey @dzabel this won't be fixed in 5.2.0, but should be fixed in 5.3.0, which is coming soon -- but I don't know how soon. I'll drop a note here as soon as it is, just so you know.
Hey, just wanted to drop a line and say that 5.3.0 is out for our open source users -- hope that helps!
Please ensure you do the following when reporting a bug:
Note that some logs needed to troubleshoot may be found in the
/pgdata/<CLUSTERNAME>/pg_log
directory on your Postgres instance.An incomplete bug report can lead to delays in resolving the issue or the closing of a ticket, so please be as detailed as possible.
If you are looking for general support, please view the support page for where you can ask questions.
Thanks for reporting the issue, we're looking forward to helping you!
Overview
Upon upgrading PGO to 4.7.4 and upgrading a pg cluster. The metrics container for the cluster began logging errors:
Environment
Please provide the following details:
Steps to Reproduce
REPRO
Provide steps to get to the error condition: Have existing Prometheus Deployment Deploy PGO 4.6.2 Do NOT Deploy pgmonitor Create a cluster with metrics Ensure metrics are reaching the existing Prometheus instance should not be generating psql_scrape_errors yet Upgrade PGO to 4.7.4 Upgrade the pg cluster
EXPECTED
No scrape errors were expected. All metrics are going into non-pgmonitor Prometheus.
ACTUAL
Metrics are still reaching the non-pgmonitor Prometheus, but the exporter is now failing to retrieve some metrics with the following errors:
Logs
See above