MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.71k stars 466 forks source link

Validating optimizer changes against production catalogs #17883

Open teskje opened 1 year ago

teskje commented 1 year ago

To further ensure before a release that there are no regressions in the plans produced by the optimizer, it would be helpful if we had a way to run it on DDL statements stored in production catalogs.

The way this could work is by having a framework that:

The validation should at least include verifying that the new optimizer can handle the DDL without errors. Apart from that, we can also verify that the resulting plans are the same or have only changed in expected ways.

philip-stoev commented 1 year ago

https://resources.snowflake.com/report/snowtrail-testing-with-production-series-on-a-cloud-database

ggevay commented 1 year ago

Some ideas for specific automated checks (from https://materializeinc.slack.com/archives/C04QHS93G0Z/p1678703336777329?thread_ts=1678646973.384439&cid=C04QHS93G0Z):

mgree commented 1 year ago

Automated checks could also use a (hypothetical, for now) explicit cost model on MIR to check for regressions.

philip-stoev commented 1 year ago

Proposed implementation https://www.notion.so/materialize/1c36cdc61a224948b623318a02db3bb8

ggevay commented 1 year ago

This was again relevant for incident-55: making sure that the incident was not caused by plan changes was quite time consuming, because I had to look into the audit log to verify that the same indexes existed when originally creating the dataflows. It would be great to automate this.

ggevay commented 1 month ago

In the meantime, there are various things that started to go in this direction:

We should decide which of these we'd like to develop further to do the optimizer checks proposed here.