dask-contrib / dask-deltatable

A Delta Lake reader for Dask
BSD 3-Clause "New" or "Revised" License
46 stars 15 forks source link

AttributeError: 'pyarrow.lib.Schema' object has no attribute 'origin' when writing dask dataframe using dask-deltatable's to_deltalake() #80

Open Abhishek1005 opened 3 months ago

Abhishek1005 commented 3 months ago

I'm reading in the delta table as dask dataframe using dask-deltatable's read_deltalake('path', engine='pyarrow'). After performing some manipulations, I'm trying to write the dask dataframe as deltatable using to_deltalake('path', ddf).

However, I'm getting the following error. The parquet files are getting created in the destination but the delta log folder is not. The final "delta-commit" step fails when observing the dask dashboard.

AttributeError: 'pyarrow.lib.Schema' object has no attribute 'origin' corresponding to this line.

This is a Schema Violation exception and I can bypass this error when I explicitly mention the schema when writing (using to_deltalake('path', ddf, schema=schema). But specifying the schema everytime while writing is a tedious task and not a very good approach.

FYI, I'm using deltalake==0.13.0 (as only this supports dask-deltatable) and dask-deltatable==0.3.1

Related Issue(s)

Also, I assume this is related to this issue: #686