apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6k stars 1.14k forks source link

Decorrelation rewrite doesn't respect identifier normalizer setting? #12422

Open paf31 opened 2 weeks ago

paf31 commented 2 weeks ago

Describe the bug

The result of a query which relies on the decorrelation optimization seems to depend on the DATAFUSION_SQL_PARSER_ENABLE_IDENT_NORMALIZATION setting: if it is on, the query works as expected. If not, I get a field-not-in-scope error:

Schema error: No field named __scalar_sq_1.invoiceid. Valid fields are "Invoice"."invoiceId".

To Reproduce

DATAFUSION_SQL_PARSER_ENABLE_IDENT_NORMALIZATION=false datafusion-cli
SELECT
    *,
    (
        SELECT
            count(1)
        FROM
        VALUES
            (1),
            (1),
            (2) AS InvoiceLine(invoiceId)
        WHERE
            InvoiceLine.invoiceId = Invoice.invoiceId
    )
FROM
VALUES
    (1),
    (2),
    (3) AS Invoice(invoiceId);

Expected behavior

+------------+-----------------+
| invoice_id | count(Int64(1)) |
+------------+-----------------+
| 1          | 2               |
| 2          | 1               |
| 3          | 0               |
+------------+-----------------+

Additional context

Error disappears if DATAFUSION_SQL_PARSER_ENABLE_IDENT_NORMALIZATION flag is turned on.

Error disappears if I manually snake-case all idents in the query.

The plan illustrates the issue:

Projection: Invoice.invoiceId, CASE WHEN __scalar_sq_1.__always_true IS NULL THEN Int64(0) ELSE __scalar_sq_1.count(Int64(1)) END AS count(Int64(1))
  Left Join:  Filter: __scalar_sq_1.invoiceid = Invoice.invoiceId
    SubqueryAlias: Invoice
      Projection: column1 AS invoiceId
        Values: (Int64(1)), (Int64(2)), (Int64(3))
    SubqueryAlias: __scalar_sq_1
      Projection: count(Int64(1)), Boolean(true) AS __always_true

==================================^ InvoiceLine.invoiceId is missing here

        Aggregate: groupBy=[[InvoiceLine.invoiceId]], aggr=[[count(Int64(1))]]
          SubqueryAlias: InvoiceLine
            Projection: column1 AS invoiceId
              Values: (Int64(1)), (Int64(1)), (Int64(2))
JasonLi-cn commented 1 week ago

The main branch has fixed this issue. I guess it has something to do with this PR: https://github.com/apache/datafusion/pull/12426