Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.35k stars 166 forks source link

[FEAT] Streaming Catalog Writes #3160

Closed colin-ho closed 2 weeks ago

colin-ho commented 3 weeks ago

Implements streaming Iceberg and Delta writes for swordfish. Most of the write scaffolding has already been implemented in https://github.com/Eventual-Inc/Daft/pull/2992, this PR implements the Iceberg/Delta specific functionalities.

A quick TLDR on swordfish writes:

Notes:

codspeed-hq[bot] commented 3 weeks ago

CodSpeed Performance Report

Merging #3160 will not alter performance

Comparing colin/streaming-catalog-writes-2 (073b2a5) with main (8ed174c)

Summary

✅ 17 untouched benchmarks

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 75.55556% with 110 lines in your changes missing coverage. Please review.

Project coverage is 79.13%. Comparing base (8817a08) to head (073b2a5). Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
daft/io/writer.py 0.00% 65 Missing :warning:
daft/delta_lake/delta_lake_write.py 72.60% 20 Missing :warning:
daft/iceberg/iceberg_write.py 59.45% 15 Missing :warning:
src/daft-writers/src/catalog.rs 92.72% 4 Missing :warning:
src/daft-writers/src/lib.rs 95.52% 3 Missing :warning:
src/daft-local-execution/src/pipeline.rs 97.22% 1 Missing :warning:
src/daft-physical-plan/src/translate.rs 50.00% 1 Missing :warning:
src/daft-writers/src/python.rs 98.64% 1 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160/graphs/tree.svg?width=650&height=150&src=pr&token=J430QVFE89&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) ```diff @@ Coverage Diff @@ ## main #3160 +/- ## ========================================== + Coverage 78.66% 79.13% +0.47% ========================================== Files 634 637 +3 Lines 78175 77944 -231 ========================================== + Hits 61496 61682 +186 + Misses 16679 16262 -417 ``` | [Files with missing lines](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | Coverage Δ | | |---|---|---| | [daft/execution/execution\_step.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=daft%2Fexecution%2Fexecution_step.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9leGVjdXRpb24vZXhlY3V0aW9uX3N0ZXAucHk=) | `89.43% <100.00%> (+0.39%)` | :arrow_up: | | [daft/execution/physical\_plan.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=daft%2Fexecution%2Fphysical_plan.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9leGVjdXRpb24vcGh5c2ljYWxfcGxhbi5weQ==) | `94.01% <ø> (+0.14%)` | :arrow_up: | | [daft/execution/rust\_physical\_plan\_shim.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=daft%2Fexecution%2Frust_physical_plan_shim.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9leGVjdXRpb24vcnVzdF9waHlzaWNhbF9wbGFuX3NoaW0ucHk=) | `95.60% <ø> (+1.03%)` | :arrow_up: | | [daft/logical/builder.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=daft%2Flogical%2Fbuilder.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9sb2dpY2FsL2J1aWxkZXIucHk=) | `89.87% <100.00%> (+0.26%)` | :arrow_up: | | [daft/table/table\_io.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=daft%2Ftable%2Ftable_io.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC90YWJsZS90YWJsZV9pby5weQ==) | `88.75% <100.00%> (+3.00%)` | :arrow_up: | | [src/daft-local-execution/src/sinks/write.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fsinks%2Fwrite.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9zaW5rcy93cml0ZS5ycw==) | `100.00% <100.00%> (ø)` | | | [src/daft-physical-plan/src/local\_plan.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=src%2Fdaft-physical-plan%2Fsrc%2Flocal_plan.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtcGh5c2ljYWwtcGxhbi9zcmMvbG9jYWxfcGxhbi5ycw==) | `96.36% <100.00%> (+0.26%)` | :arrow_up: | | [src/daft-plan/src/builder.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=src%2Fdaft-plan%2Fsrc%2Fbuilder.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtcGxhbi9zcmMvYnVpbGRlci5ycw==) | `82.16% <100.00%> (+0.69%)` | :arrow_up: | | [src/daft-plan/src/sink\_info.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=src%2Fdaft-plan%2Fsrc%2Fsink_info.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtcGxhbi9zcmMvc2lua19pbmZvLnJz) | `20.54% <ø> (ø)` | | | [src/daft-scheduler/src/scheduler.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree&filepath=src%2Fdaft-scheduler%2Fsrc%2Fscheduler.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtc2NoZWR1bGVyL3NyYy9zY2hlZHVsZXIucnM=) | `93.18% <100.00%> (+0.05%)` | :arrow_up: | | ... and [8 more](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | | ... and [75 files with indirect coverage changes](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3160/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)