apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.3k stars 1.19k forks source link

DataFrame parse_sql_expr does not handle aliases #12518

Open milenkovicm opened 1 month ago

milenkovicm commented 1 month ago

Describe the bug

df.parse_sql_expr does not handle expression aliases, for expressions like

df.parse_sql_expr("col0 as new_col0")

it will return col(col0`) expression

To Reproduce

statement

df.parse_sql_expr("SUM(int_col) as sum_int_col")

from

https://github.com/apache/datafusion/blob/aeca7ea99940401573eb14601c74316279bbdae5/datafusion-examples/examples/parse_sql_expr.rs#L125

can be used to demonstrate this issue

as return column has sum(?table?.int_col) as the name

Expected behavior

it is expected to have expression alias handled

Additional context

No response

Eason0729 commented 1 month ago

I spent some time on tracing code but found unable to fix it. Maybe we need to change some code in sqlparser, correct me if I am wrong.

Detail To be specific, I run following with debugger attached
let df = ctx.sql("SELECT 1 as col0").await?;
let expr = df.parse_sql_expr("1 + 3 * col0 as new_col0")?;
And found [this line](https://github.com/apache/datafusion/blob/1e31093fc96d70f27d79935f616af31f1994b2e1/datafusion/sql/src/parser.rs#L388), whose value return from `sqlparser` to be following. > It doesn't contain `new_col0`.
BinaryOp { left: Value(Number("1", false)), op: Plus, right: BinaryOp { left: Value(Number("3", false)), op: Multiply, right: Identifier(Ident { value: "col0", quote_style: None }) } }
Eason0729 commented 1 month ago

https://github.com/sqlparser-rs/sqlparser-rs/issues/1439

milenkovicm commented 1 month ago

I think you're right @Eason0729 I did not have much time to investigate but initial investigation lead to sqlparser

Eason0729 commented 1 month ago

There is a private method parse_expr_with_alias in sqlparser, which returns ExprWithAlias.

I would propose changing the return type from sqlparser::ast::Expr on call path to sqlparser::ast::ExprWithAlias, use enum datafusion_expr::expr::Expr::Alias in return type.

Writing a example change might be better if more description in needed

This could cause other method to change(including some public methods on SessionState).

If above is okay, I would begin working on it.

milenkovicm commented 1 month ago

parse_sql_expr returns expression, alias is an expression, why would you need to change public method in SessionState?

Eason0729 commented 1 month ago

Mainly because it could be done without changing sqlparser.

I think I should start writing code right away, as it would be better to communicate with code.

I would do it in my fork and let you know when I am ready, then I might learn how to do it without changing public method in SessionState.

Eason0729 commented 1 month ago

I just finish it in my fork.

It's not meant to be a pull request, just to showcase what I would like to change.

Here is the public API I change: diff

Also, I change sqlparser of my fork to make Parser::parse_expr_with_alias public: https://github.com/Eason0729/sqlparser-rs/commit/c6360b159f9e9baade08ab2e4d7a12083d24a567

milenkovicm commented 1 month ago

It does look good, @Eason0729. maybe @alamb and @comphead can give you better advice than me when they get some time

Eason0729 commented 1 month ago

Okay, then I will remove thing that doesn't belong to pull request and submit pull request.

alamb commented 1 month ago

PR in sqlparser: https://github.com/sqlparser-rs/sqlparser-rs/pull/1444

Eason0729 commented 1 month ago

Let's wait for https://github.com/sqlparser-rs/sqlparser-rs/pull/1444, then I will submit a PR for datafusion.

alamb commented 1 month ago

Thank you @Eason0729 -- we'll try and release a sqlparser release shortly

milenkovicm commented 2 weeks ago

Hey @Eason0729 any interested in continuing this issue?

Eason0729 commented 2 weeks ago

@milenkovicm Yes, I am interested in working on this issue.

I am waiting for next release of sqlparser, so parse_expr_with_alias method on sqlparser::parser::Parser can be make public.

milenkovicm commented 2 weeks ago

My bad, I thought it's released

alamb commented 1 week ago

My bad, I thought it's released

It is on my queue: https://github.com/apache/datafusion-sqlparser-rs/issues/1423