dask / fastparquet

python implementation of the parquet columnar file format.
Apache License 2.0
787 stars 178 forks source link

attrs persistance for Pandas #900

Closed davetapley closed 1 year ago

davetapley commented 1 year ago

Pandas 2.10 has ⬇️, but it only works with pyarrow. If pyarrow isn't installed then if falls back to fastparquet (per docs), and attrs is always loaded as {}.

I assume this need a fix here?

martindurant commented 1 year ago

Yes, this can be fixed here. fastparquet does support storing metadata using write(fn, df, ..., custom_metadata=), so if df has there attrs, we can extract them. The whole of this metadata appears in .key_value_metadata when opening the file, so we can restore them.

I am disappointed in pandas that they have once again ignored fastparquet.

Would you like to implement the functionality, @davetapley ?