delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.36k stars 416 forks source link

Optimize enhancements #1441

Open wjones127 opened 1 year ago

wjones127 commented 1 year ago

Description

This is an umbrella issue for a variety of improvements we could make to optimize:

Use Case

Related Issue(s)

mjducut-oe commented 1 year ago

Hello @wjones127, just want to check if the compression option is already exposed in python write_deltalake? I'm not sure I have missed it somewhere or it's not yet ready. We're really hoping we could do a snappy on our outputs but it might not be available yet. In any case, could you recommend a workaround for this?

Thank you so much!

wjones127 commented 1 year ago

the compression option is already exposed in python write_deltalake?

Nope, no one has made a PR for that yet. I don't think there's any workaround at the moment. Shouldn't be too hard to implement though if someone is motivated to contribute it.

ion-elgreco commented 1 year ago

the compression option is already exposed in python write_deltalake?

Nope, no one has made a PR for that yet. I don't think there's any workaround at the moment. Shouldn't be too hard to implement though if someone is motivated to contribute it.

I can maybe centralize this in a function after update, merge PRs. I also partially exposed the writer properties in those. Would be good to make this available across all the APIs

mjducut-oe commented 1 year ago

@ion-elgreco, sounds great! This will really help us especially with storage utilization on writes. Will keep watch and thank you! 🙏👍