Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.38k stars 170 forks source link

`daft.lit(val)` where `val` is of type `daft.Series` has incorrect behavior #3287

Open kevinzwang opened 2 weeks ago

kevinzwang commented 2 weeks ago

Describe the bug

When you pass a Series into daft.lit, it actually tries to use the series as a column, and if it does not match the length of a table, it fails. Instead, it should be treated as a singular list row and broadcasted, to be consistent with other literal types.

To Reproduce

>>> import daft
>>> df = daft.from_pydict({"foo": [1, 2, 3], "bar": ["a", "b", "c"]})
>>> s = daft.Series.from_pylist(["x", "y", "z"])
>>> df = df.with_column("baz", daft.lit(s))
>>> df.show()
╭───────┬──────┬──────╮
│ foo   ┆ bar  ┆ baz  │
│ ---   ┆ ---  ┆ ---  │
│ Int64 ┆ Utf8 ┆ Utf8 │
╞═══════╪══════╪══════╡
│ 1     ┆ a    ┆ x    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2     ┆ b    ┆ y    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3     ┆ c    ┆ z    │
╰───────┴──────┴──────╯

Expected behavior

╭───────┬──────┬─────────────────╮
│ foo   ┆ bar  ┆ baz             │
│ ---   ┆ ---  ┆ ---             │
│ Int64 ┆ Utf8 ┆ List[Utf8]      │
╞═══════╪══════╪═════════════════╡
│ 1     ┆ a    ┆ ['x', 'y', 'z'] │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2     ┆ b    ┆ ['x', 'y', 'z'] │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ c    ┆ ['x', 'y', 'z'] │
╰───────┴──────┴─────────────────╯

Component(s)

Expressions

Additional context

How we would implement this: