dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.65k stars 176 forks source link

Improve delta table memory footprint #2030

Open sh-rp opened 1 week ago

sh-rp commented 1 week ago

Currently we write all jobs for one delta table in one write in a referencejob that references all jobs. There seems to be a problem in the delta rust implementation that materializes all tables in memory before writing them to the destination:

https://github.com/delta-io/delta-rs/issues/2968#issuecomment-2453034758

Possible ways to fix this: