apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.07k stars 1.15k forks source link

Convert Expr to a parsable representation #7165

Open Blajda opened 1 year ago

Blajda commented 1 year ago

Is your feature request related to a problem or challenge?

In the delta-rs project we support operations such as delete, update, and merge where users can supply predicate as either a string or a DataFusion Expr. String predicates go through sql-parser to obtain an Expr and are evaluated. At the end of each operation the expression must be converted back to a string to store in the transaction log for conflict resolution.

The implementations for create_name and canonical_name almost fit this need but scalar values are surrounded by their type which cannot be parsed by sql-parser.

E.G col1 = 1 becomes col1 = Int32(1)

Describe the solution you'd like

Given an Expr one should be able to obtain it's string representation that can be parsed by sql parser.

Describe alternatives you've considered

No response

Additional context

No response

parkma99 commented 1 year ago

I made a PR #6708 about printing Expr two weeks ago. Could you give more infomation? Thanks , and I will work it this night.

Blajda commented 1 year ago

Thanks @parkma99

On delta-rs we this method to convert an sql expression to Expr. We can almost convert it back to a sql string using canonical_name however it fails to parse using the above code. This is because the method uses Debug format for scalar values which results in output like Int32(1). The parser thinks Int32(1) is a function.

I would like to have a function like canonical_name except it format scalar values as their display value.

parkma99 commented 1 year ago

cc @alamb. Do we need a new function to Convert Expr to a parsable representation? Or we just change the canonical_name function. Thanks 😊

alamb commented 1 year ago

We could potentially propose changing canonical_name - however as I recall that function is used to create the column names so it might have a large change.

Another possibility, which is somewhat of a hack, might be to add a expr rewrite pass parsing stuff like c = Int32(1) that rewrites Function: int32(arg) --> ScalarValue::Int32

parkma99 commented 1 year ago

Another possibility, which is somewhat of a hack, might be to add a expr rewrite pass parsing stuff like c = Int32(1) that rewrites Function: int32(arg) --> ScalarValue::Int32

just like does?? https://github.com/apache/arrow-datafusion/blob/02da0445f8c94019c1526d6b1759492fda266cdf/datafusion-examples/examples/rewrite_expr.rs#L163-L187

alamb commented 1 year ago

just like does??

That is a good example of using the rewriter ! Though the actual rewrite rule is a little different