dask-contrib / dask-awkward

Native Dask collection for awkward arrays, and the library to use it.
https://dask-awkward.readthedocs.io
BSD 3-Clause "New" or "Revised" License
61 stars 19 forks source link

`dak.to_parquet` should default to `extensionarray=True`, like @2ak.to_parquet` #540

Open jpivarski opened 2 months ago

jpivarski commented 2 months ago

https://github.com/dask-contrib/dask-awkward/blob/ca257cade3e0a3bdd7d2607858561170cdfe21f0/src/dask_awkward/lib/io/parquet.py#L511

  1. This option is required for status to round-trip through Parquet.
  2. The ak and dak versions of a function shouldn't have different defaults, since ak dispatches to dak, and this can make it appear to contradict its documentation.
martindurant commented 2 months ago

This option is required for status to round-trip through Parquet.

I was under the impression that we would be moving away from extension arrays and putting the required metadata into the global parquet k-v store instead. The original reason for False here was, that some combinations in the past caused hard crashes in arrow on read. We really don't want that! Perhaps it has all been fixed, but it still feels like the more complex option that people outside of HEP won't be wanting*.

* these is an argument we can have about who we expect to directly use ak dispatch versus dask-awkward versus akimbo or other avenues for reaching this code.