cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.96k stars 3.79k forks source link

sql/stats: some forecasting cluster settings do not immediately take effect #123852

Open michae2 opened 4 months ago

michae2 commented 4 months ago

The following cluster settings do not immediately take effect, and instead only take effect when new statistics are collected for a table.

Confusingly, SHOW STATISTICS WITH FORECAST seems to show them taking effect immediately, but they do not actually take effect until new statistics are collected, as can be seen with EXPLAIN.

Here's a demonstration using sql.stats.forecasts.min_observations:

CREATE TABLE a (a INT PRIMARY KEY) WITH (sql_stats_automatic_collection_enabled = false);
INSERT INTO a VALUES (1);
CREATE STATISTICS __auto__ FROM a;

-- neither SHOW nor EXPLAIN indicate that we're using a forecast
SHOW STATISTICS FOR TABLE a WITH FORECAST;
EXPLAIN SELECT * FROM a;

-- set the cluster setting
SET CLUSTER SETTING sql.stats.forecasts.min_observations = 1;

-- SHOW seems to show the cluster setting taking effect
SHOW STATISTICS FOR TABLE a WITH FORECAST;
-- but in EXPLAIN we do not see "using stats forecast"
EXPLAIN SELECT * FROM a;

-- it's not until *another* statistics collection that the cluster setting takes effect
CREATE STATISTICS __auto__ FROM a;

-- now both SHOW and EXPLAIN indicate that we're using a forecast
SHOW STATISTICS FOR TABLE a WITH FORECAST;
EXPLAIN SELECT * FROM a;

These settings were added in https://github.com/cockroachdb/cockroach/pull/122779, which failed to add any kind of invalidation of the stats cache when they are changed.

This is true as of v23.1.20 and v24.1.0-beta.3 (these settings have not yet been released in v23.2).

Jira issue: CRDB-38572

michae2 commented 4 months ago

There is a workaround: disabling and re-enabling forecasting will invalidate the stats cache and cause these settings to take effect immediately. This can be done with the cluster setting:

SET CLUSTER SETTING sql.stats.forecasts.enabled = off;
-- wait a few seconds
SET CLUSTER SETTING sql.stats.forecasts.enabled = on;

or a per-table setting to be a little less disruptive:

ALTER TABLE foo SET (sql_stats_forecasts_enabled = false);
-- wait a few seconds
ALTER TABLE foo RESET (sql_stats_forecasts_enabled);

Here's a demonstration of the workaround:

RESET CLUSTER SETTING sql.stats.forecasts.min_observations;

CREATE TABLE b (b INT PRIMARY KEY) WITH (sql_stats_automatic_collection_enabled = false);
INSERT INTO b VALUES (2);
CREATE STATISTICS __auto__ FROM b;

-- not using a forecast
EXPLAIN SELECT * FROM b;

-- set the cluster setting
SET CLUSTER SETTING sql.stats.forecasts.min_observations = 1;

-- still not using a forecast
EXPLAIN SELECT * FROM b;

-- disabling forecasting for one statement will invalidate the stats cache
ALTER TABLE b SET (sql_stats_forecasts_enabled = false);
EXPLAIN SELECT * FROM b;
ALTER TABLE b RESET (sql_stats_forecasts_enabled);

-- now we see "using stats forecast" in the plan, even without another stats collection
EXPLAIN SELECT * FROM b;
michae2 commented 2 weeks ago

Note that after https://github.com/cockroachdb/cockroach/pull/127414 there is a simpler workaround:

SELECT crdb_internal.clear_table_stats_cache();