delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.99k stars 362 forks source link

z_order `max_spill_size` parameter incorrectly documented #2205

Open wjones127 opened 4 months ago

wjones127 commented 4 months ago

The parameter says that is max bytes spilled to disk:

https://github.com/delta-io/delta-rs/blob/77ddd7cb1c93aa28e5ce709d4e5d0f7c2bde2bd2/crates/core/src/operations/optimize.rs#L173-L174

But it actually should be max bytes to keep in memory before spilling to disk. We pass it to FairSpillPool.

https://github.com/delta-io/delta-rs/blob/77ddd7cb1c93aa28e5ce709d4e5d0f7c2bde2bd2/crates/core/src/operations/optimize.rs#L1167

adriangb commented 2 months ago

It seems like this is still the case, I can make a docs PR to update this. But this does bring up other questions I had:

  1. When it spills, where does it spill to (what is the path on disk, can that be customized, is there a size limit and can that be customized)?
  2. Is the default of 20GB reasonable at all then? That's a very large amount of RAM.