Closed schewara closed 3 years ago
Describe the bug
When running a standalone postgres with monitoring enabled causes the memory usage to grow permanently until the configured limit is reached.
The affected Metrics are ccp_nodemx_mem_active_anon and ccp_nodemx_mem_rss
ccp_nodemx_mem_active_anon
ccp_nodemx_mem_rss
Restarting the postgres-exporter free's up the used memory immediately
Further Investigation showed that a specific Query took permanently around 600ms
postgres=# explain analyze WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(10) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT a.config_file , a.stanza , a.backup_data->>'type' AS backup_type , a.backup_data->'info'->'repository'->>'delta' AS repo_backup_size_bytes , a.backup_data->'info'->'repository'->>'size' AS repo_total_size_bytes , (a.backup_data->'timestamp'->>'stop')::bigint - (a.backup_data->'timestamp'->>'start')::bigint AS backup_runtime_seconds FROM per_stanza a JOIN ( SELECT config_file , stanza , backup_data->>'type' AS backup_type , max(backup_data->'timestamp'->>'start') AS max_backup_start , max(backup_data->'timestamp'->>'stop') AS max_backup_stop FROM per_stanza GROUP BY 1,2,3) b ON a.config_file = b.config_file AND a.stanza = b.stanza AND a.backup_data->>'type' = b.backup_type AND a.backup_data->'timestamp'->>'start' = b.max_backup_start AND a.backup_data->'timestamp'->>'stop' = b.max_backup_stop; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=2626697.50..3279629.74 rows=31 width=168) (actual time=630.971..631.104 rows=2 loops=1) Hash Cond: ((a.config_file = per_stanza.config_file) AND (a.stanza = per_stanza.stanza) AND ((a.backup_data ->> 'type'::text) = ((per_stanza.backup_data ->> 'type'::text))) AND (((a.backup_data -> 'timestamp'::text) ->> 'start'::text) = (max(((per_stanza.backup_data -> ' timestamp'::text) ->> 'start'::text)))) AND (((a.backup_data -> 'timestamp'::text) ->> 'stop'::text) = (max(((per_stanza.backup_data -> 'timestamp'::text) ->> 'stop'::text))))) CTE all_backups -> ProjectSet (cost=0.25..517.75 rows=100000 width=64) (actual time=630.327..630.328 rows=1 loops=1) -> Function Scan on pgbackrest_info (cost=0.25..10.25 rows=1000 width=64) (actual time=630.308..630.308 rows=1 loops=1) CTE per_stanza -> ProjectSet (cost=0.00..53250.00 rows=10000000 width=96) (actual time=630.337..630.340 rows=6 loops=1) -> CTE Scan on all_backups (cost=0.00..2000.00 rows=100000 width=64) (actual time=630.329..630.330 rows=1 loops=1) -> CTE Scan on per_stanza a (cost=0.00..200000.00 rows=10000000 width=96) (actual time=630.339..630.340 rows=6 loops=1) -> Hash (cost=2527968.75..2527968.75 rows=1000000 width=160) (actual time=0.208..0.208 rows=2 loops=1) Buckets: 65536 Batches: 32 Memory Usage: 512kB -> HashAggregate (cost=2134375.00..2517968.75 rows=1000000 width=160) (actual time=0.072..0.104 rows=2 loops=1) Group Key: per_stanza.config_file, per_stanza.stanza, (per_stanza.backup_data ->> 'type'::text) Planned Partitions: 64 Batches: 1 Memory Usage: 409kB -> CTE Scan on per_stanza (cost=0.00..225000.00 rows=10000000 width=128) (actual time=0.002..0.009 rows=6 loops=1) Planning Time: 0.267 ms JIT: Functions: 38 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 7.328 ms, Inlining 16.180 ms, Optimization 379.258 ms, Emission 234.166 ms, Total 636.932 ms Execution Time: 638.649 ms (21 rows)
After further Investigation I stumbled upon
and disabling jit massively improved the execution time
jit
postgres=# set jit=off; SET postgres=# explain analyze WITH all_backups AS ( SELECT config_file , jsonb_array_elements(data) AS stanza_data FROM monitor.pgbackrest_info(10) ) , per_stanza AS ( SELECT config_file , stanza_data->>'name' AS stanza , jsonb_array_elements(stanza_data->'backup') AS backup_data FROM all_backups ) SELECT a.config_file , a.stanza , a.backup_data->>'type' AS backup_type , a.backup_data->'info'->'repository'->>'delta' AS repo_backup_size_bytes , a.backup_data->'info'->'repository'->>'size' AS repo_total_size_bytes , (a.backup_data->'timestamp'->>'stop')::bigint - (a.backup_data->'timestamp'->>'start')::bigint AS backup_runtime_seconds FROM per_stanza a JOIN ( SELECT config_file , stanza , backup_data->>'type' AS backup_type , max(backup_data->'timestamp'->>'start') AS max_backup_start , max(backup_data->'timestamp'->>'stop') AS max_backup_stop FROM per_stanza GROUP BY 1,2,3) b ON a.config_file = b.config_file AND a.stanza = b.stanza AND a.backup_data->>'type' = b.backup_type AND a.backup_data->'timestamp'->>'start' = b.max_backup_start AND a.backup_data->'timestamp'->>'stop' = b.max_backup_stop; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=2626697.50..3279629.74 rows=31 width=168) (actual time=1.019..1.232 rows=2 loops=1) Hash Cond: ((a.config_file = per_stanza.config_file) AND (a.stanza = per_stanza.stanza) AND ((a.backup_data ->> 'type'::text) = ((per_stanza.backup_data ->> 'type'::text))) AND (((a.backup_data -> 'timestamp'::text) ->> 'start'::text) = (max(((per_stanza.backup_data -> ' timestamp'::text) ->> 'start'::text)))) AND (((a.backup_data -> 'timestamp'::text) ->> 'stop'::text) = (max(((per_stanza.backup_data -> 'timestamp'::text) ->> 'stop'::text))))) CTE all_backups -> ProjectSet (cost=0.25..517.75 rows=100000 width=64) (actual time=0.405..0.406 rows=1 loops=1) -> Function Scan on pgbackrest_info (cost=0.25..10.25 rows=1000 width=64) (actual time=0.381..0.382 rows=1 loops=1) CTE per_stanza -> ProjectSet (cost=0.00..53250.00 rows=10000000 width=96) (actual time=0.415..0.419 rows=6 loops=1) -> CTE Scan on all_backups (cost=0.00..2000.00 rows=100000 width=64) (actual time=0.407..0.408 rows=1 loops=1) -> CTE Scan on per_stanza a (cost=0.00..200000.00 rows=10000000 width=96) (actual time=0.417..0.419 rows=6 loops=1) -> Hash (cost=2527968.75..2527968.75 rows=1000000 width=160) (actual time=0.246..0.247 rows=2 loops=1) Buckets: 65536 Batches: 32 Memory Usage: 512kB -> HashAggregate (cost=2134375.00..2517968.75 rows=1000000 width=160) (actual time=0.104..0.128 rows=2 loops=1) Group Key: per_stanza.config_file, per_stanza.stanza, (per_stanza.backup_data ->> 'type'::text) Planned Partitions: 64 Batches: 1 Memory Usage: 409kB -> CTE Scan on per_stanza (cost=0.00..225000.00 rows=10000000 width=128) (actual time=0.002..0.012 rows=6 loops=1) Planning Time: 0.237 ms Execution Time: 1.374 ms (17 rows)
In the light of these findings, it would be good to also disable jit in the postgresql.conf templates.
postgresql.conf
I am currently testing the setup with jit disabled and will know within the next day, if the memory consumption keeps stable.
To Reproduce Steps to reproduce the behavior:
Expected behavior Memory consumption should be at a stable level
Screenshots will follow
Please tell us about your environment:
Screenshot of the Memory Graph
Please see #1309 which links to a full explanation.
Describe the bug
When running a standalone postgres with monitoring enabled causes the memory usage to grow permanently until the configured limit is reached.
The affected Metrics are
ccp_nodemx_mem_active_anon
andccp_nodemx_mem_rss
Restarting the postgres-exporter free's up the used memory immediately
Further Investigation showed that a specific Query took permanently around 600ms
After further Investigation I stumbled upon
and disabling
jit
massively improved the execution timeI am currently testing the setup with
jit
disabled and will know within the next day, if the memory consumption keeps stable.To Reproduce Steps to reproduce the behavior:
Expected behavior Memory consumption should be at a stable level
Screenshots will follow
Please tell us about your environment: