goccy / go-zetasqlite

A database driver library that interprets ZetaSQL queries and runs them using SQLite3
MIT License
56 stars 29 forks source link

Incorrect column count used for aliased columns #191

Closed ohaibbq closed 7 months ago

ohaibbq commented 8 months ago

QueryStmtNode does not currently use the OutputColumnList() when formatting.

The ZetaSQL analyzer parses this query into the following AST:

WITH toks AS (SELECT true AS x, 1 AS y)
SELECT DISTINCT x, x as y FROM toks
QueryStmt
+-output_column_list=
| +-$distinct.x#5 AS x [BOOL]
| +-$distinct.x#5 AS y [BOOL]
+-query=
  +-WithScan
    +-parse_location=0-80
    +-column_list=[$distinct.x#5]
    +-with_entry_list=
    | +-WithEntry
    |   +-with_query_name="toks"
    |   +-with_subquery=
    |     +-ProjectScan
    |       +-parse_location=14-38
    |       +-column_list=toks.[x#1, y#2]
    |       +-expr_list=
    |       | +-x#1 := Literal(parse_location=21-25, type=BOOL, value=true)
    |       | +-y#2 := Literal(parse_location=32-33, type=INT64, value=1)
    |       +-input_scan=
    |         +-SingleRowScan
    +-query=
      +-AggregateScan
        +-column_list=[$distinct.x#5]
        +-input_scan=
        | +-WithRefScan(column_list=toks.[x#3, y#4], with_query_name="toks")
        +-group_by_list=
          +-x#5 := ColumnRef(type=BOOL, column=toks.x#3)

The analyzer simplifies the column selection to only include column x in the aggregate scan as it is the only column used upstream. We currently only use QueryStmtNode.Query() when formatting a QueryStmtNode, its OutputColumnList() is ignored.