cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.21k stars 3.82k forks source link

sql/stats: support SHOW HISTOGRAM for forecasted stats #86358

Open michae2 opened 2 years ago

michae2 commented 2 years ago

Follow up from #79872: we currently support SHOW STATISTICS ... WITH FORECAST but the histogram_id field is always 0 for forecasts, and SHOW HISTOGRAM does not work.

For example:

CREATE TABLE h (a INT PRIMARY KEY) WITH (sql_stats_automatic_collection_enabled = false);
ALTER TABLE h INJECT STATISTICS '[
      {
          "avg_size": 1,
          "columns": [
              "a"
          ],
          "created_at": "2022-08-13 00:00:00.000000",
          "distinct_count": 3,
          "histo_buckets": [
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "4"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "5"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "6"
              }
          ],
          "histo_col_type": "INT8",
          "histo_version": 2,
          "name": "__auto__",
          "null_count": 0,
          "row_count": 3
      },
      {
          "avg_size": 1,
          "columns": [
              "a"
          ],
          "created_at": "2022-08-14 00:00:00.000000",
          "distinct_count": 3,
          "histo_buckets": [
              {
                  "distinct_range": 0,
                  "num_eq": 0,
                  "num_range": 0,
                  "upper_bound": "7"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "8"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "9"
              }
          ],
          "histo_col_type": "INT8",
          "histo_version": 2,
          "name": "__auto__",
          "null_count": 0,
          "row_count": 3
      },
      {
          "avg_size": 1,
          "columns": [
              "a"
          ],
          "created_at": "2022-08-15 00:00:00.000000",
          "distinct_count": 3,
          "histo_buckets": [
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "10"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "11"
              },
              {
                  "distinct_range": 0,
                  "num_eq": 1,
                  "num_range": 0,
                  "upper_bound": "12"
              }
          ],
          "histo_col_type": "INT8",
          "histo_version": 2,
          "name": "__auto__",
          "null_count": 0,
          "row_count": 3
      }
]';
SHOW STATISTICS FOR TABLE h WITH FORECAST;

We can show histograms of the collected statistics, but not the forecasted statistics:

demo@127.0.0.1:26257/defaultdb> SHOW STATISTICS FOR TABLE h WITH FORECAST;
  statistics_name | column_names |       created       | row_count | distinct_count | null_count | avg_size |    histogram_id
------------------+--------------+---------------------+-----------+----------------+------------+----------+---------------------
  __auto__        | {a}          | 2022-08-13 00:00:00 |         3 |              3 |          0 |        1 | 788759343272853505
  __auto__        | {a}          | 2022-08-14 00:00:00 |         3 |              3 |          0 |        1 | 788759343275540481
  __auto__        | {a}          | 2022-08-15 00:00:00 |         3 |              3 |          0 |        1 | 788759343277932545
  __forecast__    | {a}          | 2022-08-16 00:00:00 |         3 |              3 |          0 |        1 |                  0
(4 rows)

Time: 4ms total (execution 4ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> SHOW HISTOGRAM 788759343272853505;
  upper_bound | range_rows | distinct_range_rows | equal_rows
--------------+------------+---------------------+-------------
  4           |          0 |                   0 |          1
  5           |          0 |                   0 |          1
  6           |          0 |                   0 |          1
(3 rows)

Time: 2ms total (execution 2ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> SHOW HISTOGRAM 788759343277932545;
  upper_bound | range_rows | distinct_range_rows | equal_rows
--------------+------------+---------------------+-------------
  10          |          0 |                   0 |          1
  11          |          0 |                   0 |          1
  12          |          0 |                   0 |          1
(3 rows)

Time: 2ms total (execution 2ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> SHOW HISTOGRAM 0;
ERROR: histogram 0 not found

As a workaround we can use SHOW STATISTICS USING JSON ... WITH FORECAST but it's pretty ugly:

demo@127.0.0.1:26257/defaultdb> SELECT jsonb_array_elements(stat->'histo_buckets')
FROM (
  SELECT jsonb_array_elements(statistics) AS stat
  FROM [SHOW STATISTICS USING JSON FOR TABLE h WITH FORECAST]
)
WHERE stat->>'name' = '__forecast__';
                           jsonb_array_elements
---------------------------------------------------------------------------
  {"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "13"}
  {"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "14"}
  {"distinct_range": 0, "num_eq": 1, "num_range": 0, "upper_bound": "15"}
(3 rows)

Time: 4ms total (execution 4ms / network 0ms)

I think the trick to doing this will be to encode the table ID and column ID in the histogram_id.

Jira issue: CRDB-18713

github-actions[bot] commented 9 months ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!