Closed alamb closed 1 month ago
Provide something I surveyed.
I think we can follow how Calcite handles the quoted issue. The SqlDialect
of Calcite has a check rule identifierNeedsQuote
.
It can be overridden according to the specific data source, such as BigQuery:
They implement some rules, such as regex patterns or reserved word lists. I think a dialect-specific rule is a nice choice.
Indeed, there is already a function on the sqlparser::dialect trait that takes this into account:
The dialect specific implementations just need to be expanded on. For now they just always return a conservative quote character.
The dialect specific implementations just need to be expanded on. For now they just always return a conservative quote character.
I think it's same as the dialect in datafusion::unparser::dialect
https://github.com/apache/datafusion/blob/e7858ff0ab1c282ab46bd93cabc3dc83db583165/datafusion/sql/src/unparser/dialect.rs#L28
However, what we need is a checker to check if the identifier needs to be quoted.
I think I can make a PR for DefaultDialect
first.
As the mentioned in dialect.rs
https://github.com/apache/datafusion/blob/e7858ff0ab1c282ab46bd93cabc3dc83db583165/datafusion/sql/src/unparser/dialect.rs#L19
I think we need to use the Dialect in sqlparser-rs instead and extract identifier_needs_quote
in #10573 to sqlparser-rs. Just like https://github.com/sqlparser-rs/sqlparser-rs/pull/1170
Yes, these are basically the same object. The one in DataFusion was put there temporarily until the trait extension in the sqlparser repo is landed and pushed to crates.io. This may have happened in the meantime.
https://github.com/apache/datafusion/pull/10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade.
10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade.
We may need a 0.46.1 for resolving the regressions:
I've locally confirmed that array.slt
is last failure for cargo test --test sqllogictests
.
I'm not sure but I think we can merge #10573 first because it also fix many unpasring tests. Then, I'll create PR for sqlparser to add the check rule in dialect.
FWI https://github.com/apache/datafusion/pull/10573 is merged!
Do we split off a ticket reduce the nr of brackets emitted?
Do we split off a ticket reduce the nr of brackets emitted?
Excellent call -- I filed https://github.com/apache/datafusion/issues/10633
Is your feature request related to a problem or challenge?
Part of https://github.com/apache/datafusion/issues/9494
As @backkem says https://github.com/apache/datafusion/pull/10528#issuecomment-2116068547 on https://github.com/apache/datafusion/pull/10528
Currently, expressions from the DataFusion SQL unparser (aka expr --> String) are somewhat ugly
For example the expression
col("a").eq(lit(5))
would be rendered asa = 5
by most poeple if they did it manaully, but DataFusion's unparser currently renders it like"a" = 5
(with extra quotes).DataFusion also puts in quotes to make the order of operations explicit -- so instead of
a < 5 AND b < 10
it would render("a" < 5) AND ("b" < 10)
The current unparser is conservative and likely works well for when generating SQL for consumptions by other database systems. However, the SQL is not as nice for human consumption
Here is another instance from the example https://github.com/apache/datafusion/blob/98647e842a85b768ea0cb0f8ccf1016636001abb/datafusion-examples/examples/plan_to_sql.rs#L50-L53
Describe the solution you'd like
If we want to make the generated SQL easier to read by humans / more succint, these steps will have to be made "smarter".
Describe alternatives you've considered
Potential ideas:
Note that the latter likely involves listing out the reserved keywords for each dialect.
Additional context
No response