apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.8k stars 1.09k forks source link

[Epic] Complete pulling out special SQL planning from the Sql Parser #11207

Closed alamb closed 1 month ago

alamb commented 1 month ago

Is your feature request related to a problem or challenge?

As discussed in https://github.com/apache/datafusion/issues/10534, @jayzhan211 added a UserDefinedSQLPlanner in https://github.com/apache/datafusion/pull/11180 so that the translation of certain SQL sytanx to LogicalPlans and Exprs are not hard coded in SqlToRel but instead are controlled by a UserDefinedSQLPlanner

Now that we have the pattern, we need to move the other remaining functionality that is hard coded (e.g. looking up a function "date_part" by name) in SqlToRel to the UserDefinedSQLPlanner

Describe the solution you'd like

To rewrite with sql planner

Describe alternatives you've considered

No response

Additional context

Discussion is here: https://github.com/apache/datafusion/issues/10534

samuelcolvin commented 1 month ago

11208 allows user defined sql planners to be defined.

samuelcolvin commented 1 month ago

See https://github.com/datafusion-contrib/datafusion-functions-json/pull/26 - support for custom SQL operators in datafusion-functions-json using #11208.

dharanad commented 1 month ago

hello @alamb Just checking in on the remaining tasks. Is there anything specific we're waiting on before we create issues ? If we're all set, i would be happy to jump in and get started to pick up few tasks.

alamb commented 1 month ago

hello @alamb Just checking in on the remaining tasks. Is there anything specific we're waiting on before we create issues ? If we're all set, i would be happy to jump in and get started to pick up few tasks.

Hi @dharanad I don't think there is anything from my perspective. Thank you for offering

In fact it seems as if @xinlifoobar has already started with https://github.com/apache/datafusion/pull/11215 ❤️

dharanad commented 1 month ago

I've created issues for a couple of tasks. Please let me know if you think anything needs updating in the descriptions. I'm new here and learning from shadowing the experienced folks

alamb commented 1 month ago

I've created issues for a couple of tasks. Please let me know if you think anything needs updating in the descriptions. I'm new here and learning from shadowing the experienced folks

thank you @dharanad -- this is very helpful 🙏

alamb commented 1 month ago

FWIW in general @dharanad I have had the best luck with writing a description on tickets that requires as little context as possible (aka distill down what is needed into the the description, rather than assuming the new contributor will read the epic and get all the backstory)

The rationale for this duplication is to lower the barrier to new contrbutors

dharanad commented 1 month ago

FWIW in general @dharanad I have had the best luck with writing a description on tickets that requires as little context as possible (aka distill down what is needed into the the description, rather than assuming the new contributor will read the epic and get all the backstory)

The rationale for this duplication is to lower the barrier to new contrbutors

Thanks for the feedback! I really appreciate. You're right, making the ticket description concise and self-contained will definitely help reduce the barrier for new contributors. I'll update the description to include the necessary context. Thanks you

dharanad commented 1 month ago

Create issues for the remaining tasks, tried adding a description based on my understanding of the issue. Also update the same for the older ones

samuelcolvin commented 1 month ago

Given how much UserDefinedSQLPlanner is being used for existing stuff within datafusion, perhaps it should be called just SQLPlanner or CustomSQLPlanner?

alamb commented 1 month ago

Given how much UserDefinedSQLPlanner is being used for existing stuff within datafusion, perhaps it should be called just SQLPlanner or CustomSQLPlanner?

I agree

Or maybe something like ExprPlanner 🤔 as it is being used to plan specific exprs.

samuelcolvin commented 1 month ago

ExprPlanner sounds good.

xinlifoobar commented 1 month ago

Given #11220 and #11243, those are very similar APIs with UDF plans. I am trying to draft an API, e.g.,

    // Plan the user defined function, returns origin expression arguments if not possible
    fn plan_udf(
        &self,
        _sql: &sqlparser::ast::Expr,
        args: Vec<Expr>,
    ) -> Result<PlannerResult<Vec<Expr>>> {
        Ok(PlannerResult::Original(args))
    }

to uniform the usages.

I have created a draft PR #11263 to discuss this. The flaw here is that the parameter sql is partially borrowed and has to be cloned at the very beginning. Maybe we should consider using references if possible.

xinlifoobar commented 1 month ago

Given #11220 and #11243, those are very similar APIs with UDF plans. I am trying to draft an API, e.g.,

    // Plan the user defined function, returns origin expression arguments if not possible
    fn plan_udf(
        &self,
        _sql: &sqlparser::ast::Expr,
        args: Vec<Expr>,
    ) -> Result<PlannerResult<Vec<Expr>>> {
        Ok(PlannerResult::Original(args))
    }

to uniform the usages.

I have created a draft PR #11263 to discuss this. The flaw here is that the parameter sql is partially borrowed and has to be cloned at the very beginning. Maybe we should consider using references if possible.

Eventually, I made this #11263, please let me know your thoughts. Thanks :)

CC @jayzhan211 @dharanad @alamb

alamb commented 1 month ago

ExprPlanner sounds good.

Filed https://github.com/apache/datafusion/issues/11304

alamb commented 1 month ago

I think we are pretty close to calling this done.

I just double checked and sql_compound_identifier_to_expr is the only thing that needs this treatment to remove the call to get_function_meta:

https://github.com/apache/datafusion/blob/bfd815622f1fe2c84d6fab32596b83ffbe52a84a/datafusion/sql/src/expr/identifier.rs#L138-L139

That appears to be the last issue https://github.com/search?q=repo%3Aapache%2Fdatafusion+get_function_meta+path%3A%2F%5Edatafusion%5C%2Fsql%5C%2F%2F&type=code

alamb commented 1 month ago

Filed https://github.com/apache/datafusion/issues/11473

alamb commented 1 month ago

I think we can claim we are done 🎉

thanks everyone