Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.17k stars 146 forks source link

errors when calling write_deltalake #2370

Open rkunnamp opened 3 months ago

rkunnamp commented 3 months ago

Describe the bug

Getting the following error when calling write_deltalake

File /opt/conda/lib/python3.11/site-packages/daft/table/table_io.py:691, in write_deltalake..file_visitor(written_file) 689 def file_visitor(written_file: Any) -> None: 690 path, partition_values = get_partitions_from_path(written_file.path) --> 691 stats = get_file_stats_from_metadata(written_file.metadata) 693 # PyArrow added support for written_file.size in 9.0.0 694 if ARROW_VERSION >= (9, 0, 0):

File /opt/conda/lib/python3.11/site-packages/daft/table/table_io.py:687, in write_deltalake..get_file_stats_from_metadata(metadata) 686 def get_file_stats_from_metadata(metadata): --> 687 deltalake.writer.get_file_stats_from_metadata(metadata, -1)

TypeError: get_file_stats_from_metadata() missing 1 required positional argument: 'columns_to_collect_stats'

To Reproduce Steps to reproduce the behavior:

  1. Go to https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page and download January 2024 Yellow tax trip record data in parquet format (At the time of writing this bug the file was https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet)
  2. Now execute the following code
    import daft
    dt = daft.read_parquet("yellow_tripdata_2024-01.parquet")
    dt.write_deltalake("t5")

The error mentioned above is obtained. On inspecting t5 folder, found that metdata files are not written.

jaychia commented 3 months ago

Hey @rkunnamp!

We realized that deltalake made a backward incompatible change in the 0.17 version.... If you pip install deltalake<0.17 instead this should be fixed 😓

We'll be figuring out a good solution here (possible lower-pinning the version of deltalake and updating our code).

rkunnamp commented 3 months ago

Thank you for that note.