delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.36k stars 416 forks source link

Add ds.write_dataset kwarg overrides to write_deltalake function #1683

Open someaveragepunter opened 1 year ago

someaveragepunter commented 1 year ago

Description

allow passing ds.write_dataset kwarg overrides into write_deltalakefunction https://github.com/delta-io/delta-rs/blob/fcfd1bfa08b8ddda2a606d7f1f58b75e89206d45/python/deltalake/writer.py#L327-L344

Use Case my specific use case is to override the basename_template param because at times when I'm bombarding S3 with thousands of concurrent tasks, hitting the parquet file directly without using the txn log to find the files yields performance benefits. hence, explicitly naming the parquet file allows me to statically deterministically specify the filename (as opposed to querying for it at runtime)

Furthermore, this would future proof and expose any additional kwargs / enhancements to the pyarrow datasets api

I'm happy to propose the change and submit PR if this is an acceptable enhancement.

ion-elgreco commented 3 months ago

@someaveragepunter can you elaborate a bit more on the use case? We use UUIDs so that there aren't any file collisions