apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.87k stars 1.11k forks source link

Add has_side_effects to PhysicalExpr #11490

Open andygrove opened 1 month ago

andygrove commented 1 month ago

Is your feature request related to a problem or challenge?

For expressions such as IF(predicate, true_expr, false_expr) or CASE WHEN predicate THEN true_expr ELSE false_expr END, we generally have to be careful to only evaluate the true_expr against rows where the predicate evaluated to true.

However, if we know that true_expr has no side effects (such as throwing exceptions on invalid inputs) and is inexpensive to compute, then we have the option of evaluating it for all rows instead, which can be considerably cheaper in some cases.

There is currently no way to determine if a PhysicalExpr has side effects. I would like to discuss adding a new method to the trait with a default implementation. Something like

fn has_side_effects(&self) -> bool {
  false
}

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 1 month ago

Maybe we could consider moving more of the physical expr implementations into ScalarUDFImpl user defined functions rather than extending more built in code 🤔

andygrove commented 1 month ago

I just discovered that we have this for logical expressions, related to this discussion:

    /// Returns true if some of this `exprs` subexpressions may not be evaluated
    /// and thus any side effects (like divide by zero) may not be encountered
    pub fn short_circuits(&self) -> bool {
andygrove commented 1 month ago

Also in ScalarUDFImpl trait:

    /// Returns true if some of this `exprs` subexpressions may not be evaluated
    /// and thus any side effects (like divide by zero) may not be encountered
    /// Setting this to true prevents certain optimizations such as common subexpression elimination
    fn short_circuits(&self) -> bool {
        false
    }