apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.33k stars 1.2k forks source link

Add `Container` trait and to simplify `Expr` and `LogicalPlan` apply and map methods #13467

Closed peter-toth closed 2 days ago

peter-toth commented 3 days ago

Which issue does this PR close?

Part of https://github.com/apache/datafusion/issues/8913.

Rationale for this change

The current implementation of LogicalPlan:apply_children(), LogicalPlan::map_children(), LogicalPlan::apply_expressions(), LogicalPlan::map_expressions(), Expr::apply_children() and Expr::map_children() are confusing due the map_until_stop_and_collect macro. I think we can introduce a trait that can contain arbitrary sibling elements that functions can be applied on and mapped:

/// [`TreeNodeContainer`] contains elements that a function can be applied on or mapped. The
/// elements of the container are siblings so the continuation rules are similar to
/// [`TreeNodeRecursion::visit_sibling`] / [`Transformed::transform_sibling`].
pub trait TreeNodeContainer<'a, T: 'a>: Sized {
    fn apply_elements<F: FnMut(&'a T) -> Result<TreeNodeRecursion>>(
        &'a self,
        f: F,
    ) -> Result<TreeNodeRecursion>;

    fn map_elements<F: FnMut(T) -> Result<Transformed<T>>>(
        self,
        f: F,
    ) -> Result<Transformed<Self>>;
}

As example for the new trait usage, this is how Expr::map_children() handled Expr::Case before this PR:

Expr::Case(Case {
    expr,           // Option<Box<Expr>>,
    when_then_expr, // Vec<(Box<Expr>, Box<Expr>)>
    else_expr,      // Option<Box<Expr>>,
}) => map_until_stop_and_collect!(
    transform_option_box(expr, &mut f),
    when_then_expr,
    when_then_expr
        .into_iter()
        .map_until_stop_and_collect(|(when, then)| {
            map_until_stop_and_collect!(
                transform_box(when, &mut f),
                then,
                transform_box(then, &mut f)
            )
        }),
        else_expr,
        transform_option_box(else_expr, &mut f)
)?...

and this is how it is handled with this PR:

Expr::Case(Case {
    expr,           // Option<Box<Expr>>,
    when_then_expr, // Vec<(Box<Expr>, Box<Expr>)>
    else_expr,      // Option<Box<Expr>>,
}) => (expr, when_then_expr, else_expr).map_elements(f)?...

Please see the type of the Case struct fields in comments. Let's focus on when_then_expr: Vec<(Box<Expr>, Box<Expr>)> first. The composable nature of TreeNodeContainer and the blanket implementation we provide for Box, 2-tuple and Vec makes it possible to just call when_then_expr. map_elements(f) to map all the expressions in when_then_expr. But we can go further and replace the map_until_stop_and_collect macro to deal with all 3 fields of Case. We can put them into a 3-tuple and can call .map_elements() as we have blanket implementation for 3-tuple as well. (Note that we can't use Vec for this as the type of the 3 fields differ.)

What changes are included in this PR?

This PR:

Are these changes tested?

Yes, with exitsing UTs.

Are there any user-facing changes?

No.

findepi commented 3 days ago

This PR:

  • Gets rid of map_until_stop_and_collect macro

that's great

Adds the Container trait and blanket implementations for Box, Option, Vec, tuples, ...

Do we need that?

What about calling this trait TreeNode or GraphNode and implementing it for our types only?

peter-toth commented 3 days ago

Adds the Container trait and blanket implementations for Box, Option, Vec, tuples, ...

Do we need that?

What about calling this trait TreeNode or GraphNode and implementing it for our types only?

Yes, we do. We already have the TreeNode trait with a well defined API. It has TreeNode::apply_children() / TreeNode::map_children() to let the tree implementations define how to visit/map the children of a node.

This new Container trait with the above mentioned blankets just make the implementation of that apply_children() / map_children() of logical plan trees (Expr and LogicalPlan) simpler. We can actually move Container and its blanket implementations to datafusion::expr if that's cleaner.

peter-toth commented 3 days ago

cc @alamb, @berkaysynnada

peter-toth commented 2 days ago

~Let me check a few more things before we merge this PR. There might be a way to make this even simpler.~

NVM.

peter-toth commented 2 days ago

Thanks for the review @alamb , @berkaysynnada , @findepi!