MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.66k stars 457 forks source link

`Let` canonicalization #16362

Open frankmcsherry opened 1 year ago

frankmcsherry commented 1 year ago

Materialize uses Let operators in expressions to allow for re-use. At the moment, they can be arbitrary AST nodes allowing for scoped re-use, but this freedom comes with some amount of stress and doesn't necessarily gain us what it would in an imperative language.

  1. Identifiers can (and are) re-used, and one must take some care in manipulating the expressions to ensure that bound state is correctly invalidated when leaving a Let operator (or more often, not do this and hope that UpdateLets has been called recently).
  2. Transforms and traversals need to think about Let operators at any moment of their traversal, and it may be easier to put all bindings at one location (the root, say).
  3. Arbitrary Let nodes in the AST results in deeper ASTs that more often tickle recursion limits.

I don't have a list of good things that nested Let operators would allow, other than scoping for correctness and resource reclamation. The latter doesn't really apply in our execution model, and the former seems .. to be a nuanced trade-off of footguns.

So, a proposal is a pile of work in support of "put all Let bindings at the root". Candidate work items include:

  1. Introduce "lets at the root" as a principle we commit to for MirRelationExpr. a. Introduce a transform that migrates all Let expressions to the AST root; b. ensure this transform is applied after any action that might introduce an internal Let expression; c. introduce debug checks that expressions are always in this form.
  2. Migrate MirRelationExpr::Let to a MirRelationExpr::MultiLet variant. a. Introduce a MultiLet variant; panic in each transform if it is observed. b. Introduce a local transform to convert a stack of Let expressions to a MultiLet and back again. c. Incrementally, implement MultiLet behavior for transforms, remove the panic, and redirect Let implementations to convert to MultiLet and redirect to that implementation. d. Once all unimplemented!() panics are removed, apply the conversion to MultiLet as soon as possible (as part of lowering?). e. Remove the Let variant, and perhaps rename MultiLet to Let.
  3. Remove Let from LIR a. Move all Let bindings at the root of an LIR expression into the dataflow objects to construct. b. Move all Let bindings to the root of MIR first, to ensure all are extracted. c. Soft panic if we observe a Let as part of an LIR plan. b. Do the movement as part of conversion to LIR, and remove the Let variant.

These are not all equally important. I put them in the order that seems most significant to me, but they are meant to be independently achievable.

(Slack thread: https://materializeinc.slack.com/archives/C02PPB50ZHS/p1669727395057499)

aalexandrov commented 1 year ago

We have code for

  1. Introduce "lets at the root" as a principle we commit to for MirRelationExpr."

in explain_new/mir/mod.rs (look for normalize_lets_in_tree) that I'm happy to convert into a first-class Transform implementation.

aalexandrov commented 1 year ago

Also, I think the steps in (2) should include the introduction of multi_let_optimzier feature flag, and we should change transforms during that process to only go through the MultiLet route if this flag is enabled.

ggevay commented 1 year ago
  1. Introduce "lets at the root" as a principle we commit to for MirRelationExpr."

Doesn't InlineLet also do this? The if inlinable { part is either inlining a Let, in which case the Let disappears, or it puts the Let into lets. Then, outside action, the for loop on lets will place those one-by-one at the top of the expression.

frankmcsherry commented 1 year ago

InlineLet may do this, but isn't documented to do so. I read through the description briefly, and it only mentions inlining of lets, rather than promoting anything. It would be good to have a CanonicalizeLets transform that does both InlineLet and UpdateLet as appropriate. But certainly, if the code does already do this, then we should just use it and document it as such.

ggevay commented 2 weeks ago

NormalizeLets has done a part of this, but not all. In particular: