MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.67k stars 458 forks source link

Add observability to the query optimizer #16344

Open antiguru opened 1 year ago

antiguru commented 1 year ago

Feature request

The query optimizer currently only exposes information about its inner workings as part of EXPLAIN, which suffers from limitations. It needs to be explicitly issued, and it only shows the state of a query at discrete points, for example after decorrelation. It does not allow us to observe the behavior and performance of the optimizer in a more complete approach.

To improve the situation we should add per-transform statistics that capture:

This data could be represented as a table containing a timing histogram per transformation and whether it changed its inputs.

Related: https://github.com/MaterializeInc/materialize/issues/13140

aalexandrov commented 1 year ago

Several optimizations run multiple times per pipeline, so I think we should consider using the stage path (which identifies a transform application within the pipeline) rather than the transform name as the key for this table.

I can think about the following additional metrics for which it will be good to collect histograms:

For the general optimization pass:

For some key moments of the pipeline:

We don't need to implement all of those at once, but if we build an infrastructure for collecting histograms these are some optimizer-related metrics that I think can be a good fit.

It will be quite nice if we have direct access to these statistics so we can use them as part of our automation and alerting.

aalexandrov commented 1 year ago

[...] it only shows the state of a query at discrete points, for example after decorrelation.

Strictly speaking this is not correct. With EXPLAIN OPTIMIZER TRACE you get information about the timing and the plan after each stage of the optimizer. It's quite expensive to collect this full trace at every step, though.

ggevay commented 1 year ago

We can also add metrics for window functions: