divviup / janus

Experimental implementation of the Distributed Aggregation Protocol (DAP) specification.
Mozilla Public License 2.0
53 stars 15 forks source link

Review unused indices #1739

Open divergentdave opened 1 year ago

divergentdave commented 1 year ago

We should review our indices from time to time to see if there are any that are not being used, either due to code changes or unexpected query plan behavior. Here's a query to use as a starting point, and its results on the staging database. (We ought to try this on a database with more tasks and with use of both query types, plus fixed size time bucketing, before drawing any conclusions) This query ignores primary indices and unique indices, because those are always being used on inserts even if they are never scanned.

postgres=> SELECT schemaname, pg_stat_all_indexes.relname, indexrelname, amname, indpred IS NOT NULL AS is_partial, idx_scan FROM pg_stat_all_indexes LEFT JOIN pg_index ON pg_stat_all_indexes.indexrelid = pg_index.indexrelid LEFT JOIN pg_class ON pg_stat_all_indexes.indexrelid = pg_class.oid LEFT JOIN pg_am ON pg_class.relam = pg_am.oid WHERE idx_scan <= 10 AND schemaname != 'pg_toast' AND schemaname != 'pg_catalog' AND NOT pg_index.indisprimary AND NOT pg_index.indisunique;
 schemaname |       relname        |                    indexrelname                     | amname | is_partial | idx_scan 
------------+----------------------+-----------------------------------------------------+--------+------------+----------
 public     | collection_jobs      | collection_jobs_state_and_lease_expiry              | btree  | t          |        0
 public     | outstanding_batches  | outstanding_batches_task_and_time_bucket_index      | btree  | f          |        0
 public     | tasks                | task_id_index                                       | btree  | f          |        0
 public     | aggregation_jobs     | aggregation_jobs_task_and_batch_id                  | btree  | f          |        0
 public     | aggregation_jobs     | aggregation_jobs_task_and_client_timestamp_interval | gist   | f          |        0
 public     | collection_jobs      | collection_jobs_interval_containment_index          | gist   | f          |        0
 public     | aggregate_share_jobs | aggregate_share_jobs_interval_containment_index     | gist   | f          |        0
SELECT
    schemaname,
    pg_stat_all_indexes.relname,
    indexrelname,
    amname,
    indpred IS NOT NULL AS is_partial,
    idx_scan
FROM pg_stat_all_indexes
LEFT JOIN pg_index ON pg_stat_all_indexes.indexrelid = pg_index.indexrelid
LEFT JOIN pg_class ON pg_stat_all_indexes.indexrelid = pg_class.oid
LEFT JOIN pg_am ON pg_class.relam = pg_am.oid
WHERE idx_scan <= 10
AND schemaname != 'pg_toast'
AND schemaname != 'pg_catalog'
AND NOT pg_index.indisprimary
AND NOT pg_index.indisunique;
divergentdave commented 1 year ago

We could also write a query for postgres_exporter to load the idx_scan counts into Prometheus. That would let us look back over a few hours or days with a rate() query and identify any idx_scan values that have stopped increasing.