dalibo / temboard

PostgreSQL Remote Control
https://labs.dalibo.com/temboard
Other
452 stars 54 forks source link

Undefined metrics after agent upgrade in 7.10 version #996

Closed ccalimeero closed 2 years ago

ccalimeero commented 2 years ago

Hi,

After upgrade agent in from 7.5 to 7.10 version on an postgresql 11 database server many metric return undefined values.

Btree Index bloat ratio
Heap bloat ratio
temporary files size

I dont find any error in agent log file :

2022-03-02 10:22:12,064 temboardagent[14584]: [api] INFO: Starting discovery.
2022-03-02 10:22:12,072 temboardagent[14584]: [api] INFO: Discovery done.
2022-03-02 10:22:12,072 temboardagent[14584]: [httpd] INFO: client: 172.20.2.3 request: "GET /discover HTTP/1.1" 200 - 12.73ms
2022-03-02 10:22:12,198 temboardagent[14584]: [httpd] INFO: client: 172.20.2.3 request: "GET /monitoring/history?key=1e3676000115a34768b58ecfe18b3bf4&limit=100&start=2022-03-02T09:20:29Z HTTP/1.1" 200 - 3.60ms
2022-03-02 10:22:30,064 temboardagent[996]: [monitoring] INFO: Starting monitoring collector.
2022-03-02 10:22:30,066 temboardagent[996]: [monitoring] INFO: Gathering host information.
2022-03-02 10:22:30,085 temboardagent[996]: [monitoring] INFO: Load the probes to run.
2022-03-02 10:22:30,130 temboardagent[996]: [probes] INFO: Running probes at 2022-03-02T10:22:30.130518+01:00.
2022-03-02 10:22:30,131 temboardagent[996]: [probes] INFO: Running instance probe sessions.
2022-03-02 10:22:30,134 temboardagent[996]: [probes] INFO: Running instance probe xacts.
2022-03-02 10:22:30,230 temboardagent[996]: [probes] INFO: Running instance probe locks.
2022-03-02 10:22:30,248 temboardagent[996]: [probes] INFO: Running instance probe blocks.
2022-03-02 10:22:30,327 temboardagent[996]: [probes] INFO: Running instance probe bgwriter.
2022-03-02 10:22:30,336 temboardagent[996]: [probes] INFO: Running instance probe db_size.
2022-03-02 10:22:30,356 temboardagent[996]: [probes] INFO: Running instance probe tblspc_size.
2022-03-02 10:22:30,377 temboardagent[996]: [probes] INFO: Running host probe filesystems_size.
2022-03-02 10:22:30,381 temboardagent[996]: [probes] INFO: Running host probe cpu.
2022-03-02 10:22:30,389 temboardagent[996]: [probes] INFO: Running host probe process.
2022-03-02 10:22:30,397 temboardagent[996]: [probes] INFO: Running host probe memory.
2022-03-02 10:22:30,398 temboardagent[996]: [probes] INFO: Running host probe loadavg.
2022-03-02 10:22:30,398 temboardagent[996]: [probes] INFO: Running instance probe wal_files.
2022-03-02 10:22:30,408 temboardagent[996]: [probes] INFO: Running instance probe replication_lag.
2022-03-02 10:22:30,408 temboardagent[996]: [probes] INFO: Running instance probe temp_files_size_delta.
2022-03-02 10:22:30,501 temboardagent[996]: [probes] INFO: Running instance probe replication_connection.
2022-03-02 10:22:30,502 temboardagent[996]: [probes] INFO: Running database probe heap_bloat.
2022-03-02 10:22:30,991 temboardagent[996]: [probes] INFO: Running database probe btree_bloat.
2022-03-02 10:22:31,210 temboardagent[996]: [probes] INFO: Finished probes run.
2022-03-02 10:22:31,210 temboardagent[996]: [monitoring] INFO: Add data to metrics table.
2022-03-02 10:22:31,226 temboardagent[996]: [monitoring] INFO: Collect done.

no more errors in temboard UI log file :

2022-03-02 10:23:36,747 temboardui[459]: [statements] INFO: Pulling statements from SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:23:39,596 temboardui[459]: [statements] INFO: Successfully pulled statements data for SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:06,426 temboardui[27356]: [access] INFO: 200 GET /server/SONATE-P-PG-01.localdomain/2345/monitoring/unavailability?start=2022-03-02T08:24:05.921Z&end=2022-03-02T09:24:05.921Z (10.121.23.250) 104.16ms
2022-03-02 10:24:06,429 temboardui[27356]: [access] INFO: 200 GET /server/SONATE-P-PG-01.localdomain/2345/monitoring/data/load1?start=2022-03-02T08:24:05.928Z&end=2022-03-02T09:24:05.928Z (10.121.23.250) 105.41ms
2022-03-02 10:24:06,483 temboardui[27356]: [access] INFO: 200 GET /server/SONATE-P-PG-01.localdomain/2345/monitoring/data/tps?start=2022-03-02T08:24:05.921Z&end=2022-03-02T09:24:05.921Z (10.121.23.250) 163.03ms
2022-03-02 10:24:12,640 temboardui[601]: [monitoring] INFO: Scheduling collector for agent SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:13,756 temboardui[615]: [monitoring] INFO: Starting collector for SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:13,756 temboardui[615]: [monitoring] INFO: Discovering hostname from agent SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:13,820 temboardui[615]: [monitoring] INFO: Found host https://github.com/dalibo/temboard/issues/10 and instance https://github.com/dalibo/temboard/issues/10 for agent SONATE-P-PG-01.localdomain:2345, hostname SONATE-P-PG-01.localdomain.
2022-03-02 10:24:13,892 temboardui[615]: [monitoring] INFO: Querying monitoring history from agent SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:13,923 temboardui[615]: [monitoring] INFO: Got points for SONATE-P-PG-01.localdomain:2345 at 2022-03-02 09:23:32 +0000.
2022-03-02 10:24:13,923 temboardui[615]: [monitoring] INFO: Update the inventory for SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:13,944 temboardui[615]: [monitoring] INFO: Insert instance availability for SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:13,948 temboardui[615]: [monitoring] INFO: Insert collected metrics for SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:14,524 temboardui[615]: [monitoring] INFO: Populate checks for host: SONATE-P-PG-01.localdomain.
2022-03-02 10:24:14,576 temboardui[615]: [monitoring] INFO: Apply alerting checks against preprocessed data for agent SONATE-P-PG-01.localdomain:2345.
2022-03-02 10:24:15,907 temboardui[615]: [monitoring] INFO: End of collector for agent SONATE-P-PG-01.localdomain:2345.

I tried to delete the instance to re-register it later in the Temboard console but that does not change anything.

would you have any idea of ​​e that I am missing in my manipulations.

Thank you in advance for your help

Cyril

bersace commented 2 years ago

Bonjour Cyril,

Can you enable DEBUG logs on agent ? Do you have errors in monitored Postgres logs ?

ccalimeero commented 2 years ago

Bonjour Étienne,

After uninstalling the agent and then reinstalling it, the problem is no longer present on my first server.

I think I had an upgrade issue that I haven't identified. The best for me was to start from scratch to clean up. All is good now for this PostgreSQL instance.

But on my second instance the problem persists with some differences. This instance is a physical replication of my first server. In the probes of this instance I indicated this:

probes = locks,process,db_size,tblspc_size,sessions,blocks,xacts,replication,loadavg,filesystems_size,cpu,bgwriter,memory

because it is recommended to disable the "wal_file" probe on a stanby server. when I restart the agent, all the probes around postgresql go into UNDEF.

On the other hand, if I position the probe with * (all), the metrics all return except "streaming replication connection" which remains in UNDEF but for which I find an error in the postgres logs

2022-03-02 11:37:02.905 CET [8792] ERREUR: erreur de syntaxe sur ou près de « CASE » au caractère 34 2022-03-02 11:37:02.905 CET [8792] INSTRUCTION : SELECT '172.20.2.40' AS upstream CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END AS connected FROM pg_stat_wal_receiver WHERE status='streaming' AND conninfo LIKE '%host=172.20.2.40%'

There seems to be a missing "," between the AS upstream and the CASE statement

postgres=# SELECT '172.20.2.40' AS upstream, postgres-# CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END AS connected postgres-# FROM pg_stat_wal_receiver postgres-# WHERE status='streaming' AND postgres-# conninfo LIKE '%host=172.20.2.40%'; upstream | connected -------------+----------- 172.20.2.40 | 1 (1 ligne)

Cyril

bersace commented 2 years ago

Yep, the missing comma is fixed in dalibo/temboard#1021

ccalimeero commented 2 years ago

Thanks a lot Etienne, don't see this topic sorry.

Everything is ok for me i close the topic.

Cyril