NikolayS / postgres_dba

The missing set of useful tools for Postgres DBAs and all engineers
BSD 3-Clause "New" or "Revised" License
1.03k stars 113 forks source link

bloat estimate precise question #58

Closed bagafoot closed 2 years ago

bagafoot commented 2 years ago

Hello, I'm comparing bloate results between pgstattuple and posgres_dba and its varying a lot. results from same table

pgstattuple: 1.93% postgrtesq_dba: 38.77%

commands: pgstattuple: SELECT pg_size_pretty(pg_relation_size('test_table')) as table_size, (pgstattuple('test_table')).dead_tuple_percent;

:dba: b1

NikolayS commented 2 years ago

@bagafoot this is expected – that's why it's called estimate.

pgstattuple performs scans, it's slow and causes intensive IO on large tables/indexes,

while the "estimated" approach is doing the job very fast. But sometimes quite wrong, yes. Simple example when it's wrong:

check_bloat=# create table t as select i::int4, 2::int8, 3::int4 from generate_series(1, 10000000) i;
SELECT 10000000
check_bloat=# vacuum analyze t;
VACUUM
check_bloat=# :dba
Menu:
   0 – Node & current DB information: master/replica, lag, DB size, tmp files, etc.
   1 – Databases: size, stats
   2 – Tables: table/index/TOAST size, number of rows
   3 – Load profile
  a1 – Current activity: count of current connections grouped by database, user name, state
  b1 – Table bloat (estimated)
  b2 – B-tree index bloat (estimated)
  b3 – Table bloat (requires pgstattuple; expensive)
  b4 – B-tree indexes bloat (requires pgstattuple; expensive)
  b5 – Tables and columns without stats (so bloat cannot be estimated)
  e1 – Extensions installed in current DB
  i1 – Unused and rarely used indexes
  i2 – Redundant indexes
  i3 – FKs with Missing/Bad Indexes
  i4 – Invalid indexes
  i5 – Cleanup unused and redundant indexes – DO & UNDO migration DDL
  l1 – Lock trees (leightweight)
  l2 – Lock trees, detailed (based on pg_blocking_pids())
  p1 – [EXP] Alignment padding: how many bytes can be saved if columns are reordered?
  s1 – Slowest queries, by total time (requires pg_stat_statements)
  s2 – Slowest queries report (requires pg_stat_statements)
  t1 – Postgres parameters tuning
  v1 – Vacuum: current activity
  v2 – Vacuum: VACUUM progress and autovacuum queue
   q – Quit

Type your choice and press <Enter>:
b1
 Is N/A | Table |  Size  |      Extra      | Bloat estimate  |  Live   |    Last Vaccuum     | Fillfactor
--------+-------+--------+-----------------+-----------------+---------+---------------------+------------
        | t     | 498 MB | ~77 MB (15.43%) | ~77 MB (15.43%) | ~421 MB | 2022-01-04 11:28:12 |        100
(1 row)

-- here we have "gaps" in storage due to alignment padding (see, for example, here: https://docs.gitlab.com/ee/development/ordering_table_columns.html) and b1 thinks we have 15% bloat, while it's 0 on the fresh table.

If you have ideas on how to improve it – definitely welcome. Closing for now because the behavior is as expected and the report is marked with the word "estimated".