Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.34k stars 164 forks source link

SQL: aliases cause error if one is prefix of another: FieldNotFound error #3311

Closed ukclivecox closed 2 days ago

ukclivecox commented 3 days ago

Describe the bug

Using SQL an alias ds1 causes issues when joined with a table ds1b

To Reproduce

import daft

df1 = daft.from_pydict({"idx":[1,2],"val":[10,20]})

df2 = daft.from_pydict({"idx":[1,2],"score":[0.1,0.2]})

df_sql = daft.sql("SELECT d1.* FROM df1 AS d1 JOIN df2 AS d2 ON d1.idx = d2.idx").show()
df_sql = daft.sql("SELECT d1.* FROM df1 AS d1 JOIN df2 AS d1b ON d1.idx = d1b.idx").show()

The first works the second gives an error Daft error: DaftError::FieldNotFound Column "d1.idx" not found in schema: ["idx", "val", "d1b.idx", "score"]

---------------------------------------------------------------------------
InvalidSQLException                       Traceback (most recent call last)
Cell In[53], line 8
      5 df2 = daft.from_pydict({"idx":[1,2],"score":[0.1,0.2]})
      7 df_sql = daft.sql("SELECT d1.* FROM df1 AS d1 JOIN df2 AS d2 ON d1.idx = d2.idx").show()
----> 8 df_sql = daft.sql("SELECT d1.* FROM df1 AS d1 JOIN df2 AS d1b ON d1.idx = d1b.idx").show()

    [... skipping hidden 2 frame]

File python3.10/site-packages/daft/sql/sql.py:187, in sql(sql, catalog, register_globals)
    184 planning_config = get_context().daft_planning_config
    186 _py_catalog = catalog._catalog
--> 187 _py_logical = _sql(sql, _py_catalog, planning_config)
    188 return DataFrame(LogicalPlanBuilder(_py_logical))

InvalidSQLException: Daft error: DaftError::FieldNotFound Column "d1.idx" not found in schema: ["idx", "val", "d1b.idx", "score"]

Expected behavior

SQL alias naming should not cause a difference

Component(s)

SQL

Additional context

Version 0.3.13+dev0020.84db665b