cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.02k stars 3.79k forks source link

Capture optimizer statistics as cloud telemetry data #85838

Open kevin-v-ngo opened 2 years ago

kevin-v-ngo commented 2 years ago

We should capture optimizer table statistics used by the planner as defined by the following output (row counts, distinct counts, null counts, etc.): https://www.cockroachlabs.com/docs/stable/show-statistics.html#output

This will enable internal workload replay efforts.

Jira issue: CRDB-18463

rytaft commented 2 years ago

@kevin-v-ngo are you planning to spec this out more? Or should I assign someone from SQL Queries to work on it for 23.1?

kevin-v-ngo commented 2 years ago

@rachitgsrivastava, @jordanlewis can you confirm if https://www.cockroachlabs.com/docs/stable/show-statistics.html#output has everything we need for workload replay?

jordanlewis commented 2 years ago

It does not, it also needs the histogram data if we want to make realistic fake data distributions. It would potentially be fine to start with just the counts (the output of show statistics) but it'd be less complete for sure.

rytaft commented 2 years ago

SHOW STATISTICS USING JSON will include the histograms.

michae2 commented 1 year ago

Internally there's a PrintTableStats function that gives SHOW STATISTICS USING JSON with the histograms removed. Maybe we could use that?