opt: improve autostats collection min/fraction stale rows defaults

mgartner commented 1 year ago

Copied from https://github.com/cockroachdb/cockroach/issues/64570#issuecomment-884362488:

Imagine a table that gets 100k inserts per day. It's been around for 1000 days so it now has 100m rows. With our default sql.stats.automatic_collection.fraction_stale_rows at 0.2, 25m rows need to be inserted to trigger automatic stats (25m stale rows / 125m total rows = 0.2), which won't be for 250 days. That's 25m rows and 250 days worth of values that are missing from histograms.

To make automatic stats much more aggressive, you can set sql.stats.automatic_collection.fraction_stale_rows to 0 and set sql.stats.automatic_collection.min_stale_rows to some number, say 10k. This ensures that no matter how big the table is, you're guaranteed to collect stats for the most recent rows every so often.

Postgres has the ability adjust these stats knobs at the table level. I don't believe we have that ability yet, but it would be useful for this; a user needs the ability to tune these knobs at a per table level based on the workload.

Here's how to tune auto-stats collection in Postgres for a specific table:
ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE t SET (autovacuum_vacuum_threshold = 10000);

We now have per-table autostats collection settings so that users can make auto-stats collection more regular for ever-growing tables:

sql_stats_automatic_collection_fraction_stale_rows
sql_stats_automatic_collection_min_stale_rows

However, we still receive a lot of inquiries that have this same root cause. We should consider do some of the following:

Adjusting our defaults to make stats collection more frequent for larger tables. Stats collection on small tables is cheap, so doing it often may have negligible impact on the rest of the workload. And stale stats on large tables can significantly increase query latency. Optimizing the defaults for larger tables may be wise.
Explore using dynamic thresholds for autostats collection that scale based on the table's size and rate of growth.
Publish a blog post or documentation that explains sql_stats_automatic_collection_fraction_stale_rows and sql_stats_automatic_collection_min_stale_rows and provides guidance on how they should be set for larger tables.

Jira issue: CRDB-23054

michae2 commented 1 year ago

(See also #72418 and #71828 regarding configuring number of histogram buckets differently for large tables.)

yuzefovich commented 1 year ago

[becca] perhaps we'd introduce "max stale rows" setting.

mgartner commented 1 year ago

@michae2 can you add a link in the JIRA ticket to the post-mortem document related to this issue?

rytaft commented 1 year ago

I'm going to remove the postmortem label since I think we've talked about a number of different stats improvements that might have helped this customer, and I'm not convinced that this is the one we should prioritize.

mgartner commented 1 year ago

I'd argue that this is something we should absolutely prioritize. Any customer with large tables (let's say 100+ million rows), is doomed to have extremely stale stats as a result of these defaults. Of course, this wouldn't be a magic fix for all stats related issues, but it's a relatively simple change that provides incremental benefits. If we're unwilling to change defaults, or scale them based on the size of the table, then at the very least I think we need to increase awareness about this pitfall through documentation, blog posts, etc.

rytaft commented 1 year ago

Ok fair enough! Added the label back. I just want to make sure that everything with this label is definitely something we are planning to do sooner rather than later, it's scoped and specific, and we have high confidence it will help the underlying issue.

mgartner commented 1 year ago

Makes sense. Let's leave the label for now so we can locate this easily for 23.2 planning. During planning we can refine this issue to make it scoped and specific.

cockroachdb / cockroach

opt: improve autostats collection min/fraction stale rows defaults #94627