Open kurtwheeler opened 3 years ago
I just manually vacuumed the processor job and downloader job tables using VACUUM ANALYZE [table]
, and anecdotally the queries do seem to be firing somewhat faster. It looks like this is definitely something we need to look into.
I think we need to add:
# Temporary. This will go away once we can delete things.
resource "aws_db_parameter_group" "postgres_parameter_group" {
name = "postgres-parameter-group-${var.user}-${var.stage}"
description = "Postgres Parameters ${var.user} ${var.stage}"
family = "postgres11"
parameter {
name = "deadlock_timeout"
value = "60000" # 60000ms = 60s
}
parameter {
name = "statement_timeout"
value = "60000" # 60000ms = 60s
}
# autovacuum will not work by default
parameters {
name = "autovacuum"
value = 1
}
tags = var.default_tags
}
But I still need to find out how to enable track_counts
As per the docs
autovacuum (boolean) Controls whether the server should run the autovacuum launcher daemon. This is on by default; however, track_counts must also be enabled for autovacuum to work. This parameter can only be set in the postgresql.conf file or on the server command line; however, autovacuuming can be disabled for individual tables by changing table storage parameters.
I don't think we need to add that. As I mentioned before, autovacuum is indeed running, it's just not running for all tables. We probably need to tweak thresholds. I think https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Autovacuum.html#Appendix.PostgreSQL.CommonDBATasks.Autovacuum.NeedVacuuming covers how those work
data_refinery=> select name, setting from pg_settings where name in ('autovacuum', 'track_counts');
name | setting
--------------+---------
autovacuum | on
track_counts | on
Context
In https://github.com/AlexsLemonade/refinebio/pull/2867 @wvauclain wrote:
Problem or idea
We have autovacuum on by defaulting and it is running for some tables. It doesn't seem to have triggered for the original_files nor downloader_jobs tables. I'm not sure if this is expected or not. I started looking into it using this query:
but I'm not sure what most of the columns returned mean, only that
last_autovacuum
being null must mean that a table hasn't ever been autovacuumed.Solution or next step
Look into https://aws.amazon.com/blogs/database/understanding-autovacuum-in-amazon-rds-for-postgresql-environments/, https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Autovacuum.html#Appendix.PostgreSQL.CommonDBATasks.Autovacuum.NeedVacuuming, and the results of that query to see if there's something we can do to improve the query performance of those tables, perhaps by getting autovacuum to trigger for them.