MaterializeInc / materialize

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.
https://materialize.com
Other
5.72k stars 466 forks source link

Flatten the proto representation of dataflow plans #13436

Open antiguru opened 2 years ago

antiguru commented 2 years ago

Follow-up to #13431.

The proto representation of dataflow plans follows the in-memory representation. The structure can be deeply nested, for example when creating large join plans. The recursive nature of the plan influences the code working with it to be also recursive. While easy to maintain, it has strong performance implications as large amounts of stack space can be required to work with these structures.

A better representation would be to flatten the structure into a list of plan fragments, which then merely contain pointers into the flattened vector instead of pointers in memory. This would convert the encoding and decoding steps to a linear scan through a vector.

We can tackle this twofold:

aalexandrov commented 2 years ago

After looking at the specific error message in #13431, the recursion depth is a consequence of the CSE which introduces a right-deep tree of Plan::Let variants. The issue should disappear if we move from

enum Plan<T> {
    Let {
        /// The local identifier to be used, available to `body` as `Id::Local(id)`.
        id: LocalId,
        /// The collection that should be bound to `id`.
        value: Box<Plan<T>>,
        /// The collection that results, which is allowed to contain `Get` stages
        /// that reference `Id::Local(id)`.
        body: Box<Plan<T>>,
    },
}

to

struct Binding<T>> {
    /// The local identifier for this binding.
    id: LocalId,
    /// The collection that should be bound to `id`.
    value: Box<Plan<T>>,
}

enum Plan<T> {
      Let {
          /// A list of bindings to be used.
          /// `bindings[i]` is in scope in `bindings[j]` for `j > i` and in `body`.
          bindings: Vec<Binding<T>>`,
          /// The collection that results, which is allowed to contain `Get` stages
          /// that reference `Id::Local(id)`.
          body: Box<Plan<T>>,
      },
      // ...
}

and implement a transformation that flattens Let { b1, Let { b2, body } } into Let { bindings: b1::b2, body }.

This is something that will reduce the overall depth of our plans in all representations (including HIR and MIR).