Closed spretto closed 4 weeks ago
We need some more details to be able to help here, can you provide a minimal reproducible example?
I want to provide a minimal reproducible example, but it seems related to my delta table. After writing new rows to it with pyspark 3.5.0 and delta jar "io.delta:delta-spark_2.12:3.0.0', I can't read it anymore with pyarrow. Other delta tables are still ok.
@spretto what kind of write action did you do?
I added new rows using the existing schema. I can see these new rows in spark with the correct partitions, but get this error when I try to load them as a pyarrow dataset. Same error when I try to load old partitions as well (any part of the delta table)
Can you try to reproduce it with the smallest sample table as possible and then share the table and transaction log?
Environment
Delta-rs version: python-v0.14.0
Binding: Python
Environment:
Bug
What happened: Error message when trying to run: dt = DeltaTable("/path/to/table") dt..to_pyarrow_dataset() Traceback (most recent call last): File "", line 1, in
File "/home/user/.local/lib/python3.9/site-packages/deltalake/table.py", line 866, in to_pyarrow_table
return self.to_pyarrow_dataset(
File "/home/user/.local/lib/python3.9/site-packages/deltalake/table.py", line 809, in to_pyarrow_dataset
file_sizes = self.get_add_actions().to_pydict()
File "/home/user/.local/lib/python3.9/site-packages/deltalake/table.py", line 964, in get_add_actions
return self._table.get_add_actions(flatten)
ValueError: all columns in a record batch must have the same length
What you expected to happen: Expecting a pyarrow dataset
More details: Used to work fine before updating pyspark/delta table/and delta-rs package.