apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.91k stars 1.12k forks source link

Optimize CASE expression for "expr or expr" usage #11638

Open andygrove opened 1 month ago

andygrove commented 1 month ago

Is your feature request related to a problem or challenge?

In DataFusion Comet, we have a custom IfExpr for IF(condition, true_expr, false_expr). In https://github.com/apache/datafusion-comet/pull/681 we removed its evaluate implementation and instead delegate to CaseExpr. This resulted in great performance improvements for the "column or null" and "scalar or scalar" cases thanks to recent optimizations in DataFusion, but resulted in a small regression for the "expr or expr" case.

Describe the solution you'd like

I would like to see if we can optimize for the "expr or expr" cases, learning from the original IfExpr implementation code.

Describe alternatives you've considered

No response

Additional context

No response

jatin510 commented 1 month ago

take

jatin510 commented 1 month ago

Hi, The benchmark which we were running in apache datafusion comet :

 let expr = Arc::new(
            CaseExpr::try_new(
                None,
                vec![(predicate.clone(), make_col("c2", 1))],
                Some(make_col("c3", 2)),
            )
            .unwrap(),
        );

Whose, eval method is turning out to be NoExpression

I tried comparing the code for NoExpression and Expression and other methods. I didn't find much of a difference.

Can someone please guide me, on how to approach the performance optimization part of this scenario?