Open ukclivecox opened 1 month ago
This appears to be an issue with the core engine, not just the sql implementation.
This produces the same error.
df1 = daft.from_pydict({"idx":[1,2],"val":[10,20]})
df2 = daft.from_pydict({"idx":[],"score":[]})
df1.join(df2, on="idx").show()
This is an issue with typing rather than the joins being problematic:
In [4]: df1 = daft.from_pydict({"idx":[1,2],"val":[10,20]})
...: df2 = daft.from_pydict({"idx":daft.Series.from_pylist([]).cast(daft.DataType.int64()),"score":[]})
...:
...: df1.join(df2, on="idx").show()
╭───────┬───────┬───────╮
│ idx ┆ val ┆ score │
│ --- ┆ --- ┆ --- │
│ Int64 ┆ Int64 ┆ Null │
╞═══════╪═══════╪═══════╡
╰───────┴───────┴───────╯
(No data to display: Materialized dataframe has no rows)
I think we could fix our join logic to allow joins against null columns though. @kevinzwang do you think that might be a fast fix?
Describe the bug
SQL Joins, either inner or left fail when right hand side is an empty table.
To Reproduce
This fails with
This also happens with
left joins
where its more usual to be joining with tables that may have missing rows. It works if the table has at least 1 row but fails with the same error if the table has no rows.Expected behavior
An inner join on an empty table with valid column schema should return an empty dataframe. A left join on a table with empty rows should return NULLs for all columns of right table.
Component(s)
SQL
Additional context
Testing with nightly build
0.3.8+dev0019.e4c6f3fa
that has additions for joins from https://github.com/Eventual-Inc/Daft/pull/3066