Closed mkalish closed 1 year ago
3 maintenance windows required to complete the following:
tables | runtime minutes | command | window |
---|---|---|---|
✔️action | 45 | VACUUM VERBOSE ANALYZE "action"; | 1 |
✔️report_file | 41 | VACUUM VERBOSE ANALYZE "report_file"; | 1 |
✔️action_log | 40 | VACUUM VERBOSE ANALYZE "action_log"; | 1 |
✔️item_lineage | 120 | VACUUM VERBOSE ANALYZE "item_lineage"; | 2 |
✔️report_lineage | 60 | VACUUM VERBOSE ANALYZE "report_lineage"; | 3 |
✔️covid_result_metadata | 26 | VACUUM VERBOSE ANALYZE "covid_result_metadata"; | 3 |
✔️temp_patient_demographics | 22 | VACUUM VERBOSE ANALYZE "temp_patient_demographics"; | 3 |
✔️temp_action | 13 | VACUUM VERBOSE ANALYZE "temp_action"; | 3 |
✔️task | 8 | VACUUM VERBOSE ANALYZE "task"; | 3 |
✔️sender_ips | 1 | VACUUM VERBOSE ANALYZE "sender_ips"; | 3 |
✔️october_actions | 1 | VACUUM VERBOSE ANALYZE "october_actions"; | 3 |
✔️setting | 0 | VACUUM VERBOSE ANALYZE "setting"; | 3 |
✔️flyway_schema_history | 0 | VACUUM VERBOSE ANALYZE "flyway_schema_history"; | 3 |
✔️county_lat_long | 0 | VACUUM VERBOSE ANALYZE "county_lat_long"; | 3 |
✔️email_schedule | 0 | VACUUM VERBOSE ANALYZE "email_schedule"; | 3 |
✔️jti_cache | 0 | VACUUM VERBOSE ANALYZE "jti_cache"; | 3 |
✔️lookup_table_version | 0 | VACUUM VERBOSE ANALYZE "lookup_table_version"; | 3 |
✔️lookup_table_row | 0 | VACUUM VERBOSE ANALYZE "lookup_table_row"; | 3 |
✔️temp | 0 | VACUUM VERBOSE ANALYZE "temp"; | 3 |
✔️temp_vt_resubmissions | 0 | VACUUM VERBOSE ANALYZE "temp_vt_resubmissions"; | 3 |
✔️elr_result_metadata | 0 | VACUUM VERBOSE ANALYZE "elr_result_metadata"; | 3 |
✔️today_actions | 0 | VACUUM VERBOSE ANALYZE "today_actions"; | 3 |
✔️vw_livd_table | 0 | VACUUM VERBOSE ANALYZE "vw_livd_table"; | 3 |
✔️patient_gender | 0 | VACUUM VERBOSE ANALYZE "patient_gender"; | 3 |
✔️sender | 0 | VACUUM VERBOSE ANALYZE "sender"; | 3 |
✔️state | 0 | VACUUM VERBOSE ANALYZE "state"; | 3 |
377 |
@JosiahSiegel
As we chatted about earlier this week, I'd like to see if we could update the statistics threshold for some of the columns in report_lineage
and item_lineage
where the n_distinct
count is way off. Here are the SQL commands I tested locally:
ALTER TABLE item_lineage
ALTER COLUMN child_report_id
SET STATISTICS 1000;
ALTER TABLE item_lineage
ALTER COLUMN parent_report_id
SET STATISTICS 1000;
ALTER TABLE report_lineage
ALTER COLUMN child_report_id
SET STATISTICS 1000;
ALTER TABLE report_lineage
ALTER COLUMN parent_report_id
SET STATISTICS 1000;
analyze verbose report_lineage;
analyze verbose item_lineage;
Currently, for the two tables the n_distinct
values are:
tablename | attname | n_distinct |
---|---|---|
report_lineage | parent_report_id | 1216959.0 |
report_lineage | child_report_id | 84760.0 |
item_lineage | parent_report_id | 49342.0 |
item_lineage | child_report_id | 39858.0 |
query I used to get the numbers
select tablename, attname, n_distinct
from pg_stats
where (tablename = 'report_lineage'
or tablename = 'item_lineage')
and (attname = 'parent_report_id' or attname = 'child_report_id')
Maybe we could try this in a cloned DB to see the impact and I could a query that we could use to check performance too.
@mkalish results after testing on clone db:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
tablename | attname | n_distinct -- | -- | -- report_lineage | child_report_id | 757686 report_lineage | parent_report_id | -0.167848 item_lineage | parent_report_id | 384048 item_lineage | child_report_id | 276785
Issue
Many of the ReportStream tables are insert-only which means that they never have vacuum run on them and subsequently are never analyzed. This means the stats for tables are incorrect leading to non-optimal queries
Description
Run both vacuum and analyze against all of the report stream tables that have never been analyzed
In rough priority order: