[BUG] Fix empty struct fields

colin-ho commented 4 months ago

Closes #1821

Fixes a few bugs that deal with empty structs.

allow empty structs in JSONs
allow nested empty structs from in-memory reads
remove the "":null placeholder in repr
remove the "":null placeholder upon egress to python

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (4734862) 85.60% compared to head (7251132) 85.54%. Report is 3 commits behind head on main.

Additional details and impacted files

[![Impacted file tree graph](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/1833/graphs/tree.svg?width=650&height=150&src=pr&token=J430QVFE89&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/1833?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) ```diff @@ Coverage Diff @@ ## main #1833 +/- ## ========================================== - Coverage 85.60% 85.54% -0.07% ========================================== Files 55 55 Lines 6079 6102 +23 ========================================== + Hits 5204 5220 +16 - Misses 875 882 +7 ``` | [Files](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/1833?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | Coverage Δ | | |---|---|---| | [daft/arrow\_utils.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/1833?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9hcnJvd191dGlscy5weQ==) | `96.82% <100.00%> (+1.28%)` | :arrow_up: | ... and [5 files with indirect coverage changes](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/1833/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)

samster25 commented 4 months ago

Looks like you are having a failure with pyarrow 6.0.1 in CI

colin-ho commented 4 months ago

Looks like you are having a failure with pyarrow 6.0.1 in CI

fixed

the tests were failing during a cast from struct(["":null]) to struct() to make a pyarrow table for equality check with a daft df.to_arrow() table. instead, i changed the og type to struct(), but check that the daft datatype is struct(["":null]) instead of struct()

colin-ho commented 4 months ago

theres a bug, ~~will fix~~ fixed in latest commit:

In [1]: import daft

In [2]: df = daft.from_pydict({"a":[{},None,None],"b":[1,2,3]})

In [3]: df = df.where(df["a"].is_null())

In [4]: df.show()
╭──────────┬───────╮
│ a        ┆ b     │                                                                                                                     
│ ---      ┆ ---   │
│ Struct[] ┆ Int64 │
╞══════════╪═══════╡
╰──────────┴───────╯

(No data to display: Materialized dataframe has no rows)

Eventual-Inc / Daft

[BUG] Fix empty struct fields #1833

Codecov Report