Historically, filters (Filter, WhereFilter) and columns (Selectable, SelectColumn) have been assumed to be stateful unless we could easily prove otherwise. This prevents decomposing evaluations into sub-evaluations, and it also prevents parallelization; either action can re-order evaluations in a way that violates a user's assumptions regarding state.
We should:
Offer users a way to express that their evaluations are indeed stateful
Change the default assumption from "evaluations are stateful" to "evaluations are stateless"
Parallelize more, leveraging the new assumption
Introduce expression decomposition into our parser and related stack
Apply stateless filters to data indexes in where, and exclude re-applying stateless filters whose input columns were not modified
Ensure that stateful filters act as a reordering barrier when applying filters (that is, filters may never be re-ordered to change order relative to a stateful filter).
Compile-Latency Optimizations Worth Looking At:
Don't compile formulas that are being replaced with static results. (e.g. A = NULL_INT)
Consider re-using a formula for simple lambda expressions. (e.g. B = A.getId())
Consider re-using a JavaFileManager when compiling multiple formulas at once. (See #4814)
Historically, filters (
Filter
,WhereFilter
) and columns (Selectable
,SelectColumn
) have been assumed to be stateful unless we could easily prove otherwise. This prevents decomposing evaluations into sub-evaluations, and it also prevents parallelization; either action can re-order evaluations in a way that violates a user's assumptions regarding state.We should:
where
, and exclude re-applying stateless filters whose input columns were not modifiedEnsure that stateful filters act as a reordering barrier when applying filters (that is, filters may never be re-ordered to change order relative to a stateful filter).
Compile-Latency Optimizations Worth Looking At:
A = NULL_INT
)B = A.getId()
)