Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.38k stars 170 forks source link

[CHORE] Swordfish refactors #3256

Closed colin-ho closed 1 week ago

colin-ho commented 2 weeks ago

There's a couple outstanding issues / inefficiencies / ugliness in the swordfish code. I originally intended of breaking these up into smaller PRs, but during the process of trying to split it up, I realized that all the changes are quite intertwined, and it may be easier on the reviewer to just see all of them in one. That being said, I'll try my best to explain all the changes and rationale in detail.

Problems

Proposed Changes

codspeed-hq[bot] commented 2 weeks ago

CodSpeed Performance Report

Merging #3256 will improve performances by 45.45%

Comparing colin/probe-state-bridge (e751f61) with main (711e862)

Summary

⚡ 2 improvements ✅ 15 untouched benchmarks

Benchmarks breakdown

Benchmark main colin/probe-state-bridge Change
test_iter_rows_first_row[100 Small Files] 269 ms 237.4 ms +13.32%
test_show[100 Small Files] 22.3 ms 15.3 ms +45.45%
codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 97.11316% with 25 lines in your changes missing coverage. Please review.

Project coverage is 77.59%. Comparing base (711e862) to head (e751f61). Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-local-execution/src/dispatcher.rs 93.85% 7 Missing :warning:
...local-execution/src/sinks/outer_hash_join_probe.rs 95.70% 7 Missing :warning:
...ecution/src/intermediate_ops/actor_pool_project.rs 84.00% 4 Missing :warning:
.../daft-local-execution/src/sinks/hash_join_build.rs 93.02% 3 Missing :warning:
src/daft-local-execution/src/run.rs 84.61% 2 Missing :warning:
.../src/intermediate_ops/anti_semi_hash_join_probe.rs 97.36% 1 Missing :warning:
...-execution/src/intermediate_ops/intermediate_op.rs 98.41% 1 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256/graphs/tree.svg?width=650&height=150&src=pr&token=J430QVFE89&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) ```diff @@ Coverage Diff @@ ## main #3256 +/- ## ========================================== + Coverage 77.50% 77.59% +0.08% ========================================== Files 666 666 Lines 81335 81621 +286 ========================================== + Hits 63041 63331 +290 + Misses 18294 18290 -4 ``` | [Files with missing lines](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | Coverage Δ | | |---|---|---| | [src/common/runtime/src/lib.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fcommon%2Fruntime%2Fsrc%2Flib.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2NvbW1vbi9ydW50aW1lL3NyYy9saWIucnM=) | `90.78% <100.00%> (ø)` | | | [src/daft-local-execution/src/channel.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fchannel.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9jaGFubmVsLnJz) | `98.14% <100.00%> (-0.36%)` | :arrow_down: | | [...-local-execution/src/intermediate\_ops/aggregate.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fintermediate_ops%2Faggregate.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9pbnRlcm1lZGlhdGVfb3BzL2FnZ3JlZ2F0ZS5ycw==) | `100.00% <100.00%> (ø)` | | | [...ft-local-execution/src/intermediate\_ops/explode.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fintermediate_ops%2Fexplode.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9pbnRlcm1lZGlhdGVfb3BzL2V4cGxvZGUucnM=) | `100.00% <100.00%> (ø)` | | | [...aft-local-execution/src/intermediate\_ops/filter.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fintermediate_ops%2Ffilter.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9pbnRlcm1lZGlhdGVfb3BzL2ZpbHRlci5ycw==) | `100.00% <100.00%> (ø)` | | | [...tion/src/intermediate\_ops/inner\_hash\_join\_probe.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fintermediate_ops%2Finner_hash_join_probe.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9pbnRlcm1lZGlhdGVfb3BzL2lubmVyX2hhc2hfam9pbl9wcm9iZS5ycw==) | `99.23% <100.00%> (+3.46%)` | :arrow_up: | | [...ft-local-execution/src/intermediate\_ops/project.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fintermediate_ops%2Fproject.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9pbnRlcm1lZGlhdGVfb3BzL3Byb2plY3QucnM=) | `100.00% <100.00%> (ø)` | | | [...aft-local-execution/src/intermediate\_ops/sample.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fintermediate_ops%2Fsample.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9pbnRlcm1lZGlhdGVfb3BzL3NhbXBsZS5ycw==) | `100.00% <100.00%> (ø)` | | | [...ft-local-execution/src/intermediate\_ops/unpivot.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fintermediate_ops%2Funpivot.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9pbnRlcm1lZGlhdGVfb3BzL3VucGl2b3QucnM=) | `100.00% <100.00%> (ø)` | | | [src/daft-local-execution/src/lib.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Flib.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9saWIucnM=) | `95.29% <100.00%> (+2.43%)` | :arrow_up: | | ... and [18 more](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | | ... and [3 files with indirect coverage changes](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3256/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)