Quansight / lsst_dashboard

LSST Dashboard https://quansight.github.io/lsst_dashboard/
BSD 3-Clause "New" or "Revised" License
8 stars 3 forks source link

`label` column causing issues in reading repartitioned parquet files. #125

Closed dharhas closed 4 years ago

dharhas commented 4 years ago

Similar bugs happen with both kartothek and spatialpandas. Suspect an issue with the 'null' value in this column is making repartitioned data impossible to read. Other categorical data columns work fine. Casting the column to string also does not seem to work consistently.

current workaround is to drop the column.

@timothydmorton, a guess is that this might have something to do with the self._null_label assignment you showed me maybe causing an object rather than a string being placed in the null rows.

reproducer here: https://github.com/holoviz/spatialpandas/issues/29

note reading individual partitions from pyarrow works. This seems like it might be an issue with dask not be able to inference metadata correctly.

dharhas commented 4 years ago

we are repartitioning from the source data now so this is no longer relevant.