Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.35k stars 166 forks source link

[FEAT] Monotonically Increasing Id for Swordfish #3180

Closed colin-ho closed 5 days ago

colin-ho commented 2 weeks ago

Implements monotonically increasing id as a streaming sink with max_concurrency = 1.

I tested multithreaded and single threaded implementations and found that there was no performance gain in multithreaded. This is because monotonically increasing id is a memory bound operator, all it does is allocate an array of ints for the id. Multiple threads trying to do this in parallel are bottlenecked by memory bandwidth.

It is actually also much simpler to implement this as a single threaded operation, as we just need to keep a running count of the lengths of morsels seen so far. This is effectively just row_number.

Note:

codspeed-hq[bot] commented 2 weeks ago

CodSpeed Performance Report

Merging #3180 will improve performances by ×2.2

Comparing colin/swordfish-mono-id (509e645) with main (84db665)

Summary

⚡ 2 improvements ✅ 15 untouched benchmarks

Benchmarks breakdown

Benchmark main colin/swordfish-mono-id Change
test_iter_rows_first_row[100 Small Files] 421.4 ms 375 ms +12.36%
test_show[100 Small Files] 32.7 ms 14.9 ms ×2.2
codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 84.84848% with 20 lines in your changes missing coverage. Please review.

Project coverage is 74.96%. Comparing base (84db665) to head (509e645). Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
...execution/src/sinks/monotonically_increasing_id.rs 77.01% 20 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180/graphs/tree.svg?width=650&height=150&src=pr&token=J430QVFE89&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) ```diff @@ Coverage Diff @@ ## main #3180 +/- ## ========================================== - Coverage 77.55% 74.96% -2.60% ========================================== Files 668 669 +1 Lines 82268 86911 +4643 ========================================== + Hits 63807 65151 +1344 - Misses 18461 21760 +3299 ``` | [Files with missing lines](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | Coverage Δ | | |---|---|---| | [src/daft-local-execution/src/pipeline.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fpipeline.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9waXBlbGluZS5ycw==) | `95.51% <100.00%> (+0.09%)` | :arrow_up: | | [src/daft-local-plan/src/plan.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180?src=pr&el=tree&filepath=src%2Fdaft-local-plan%2Fsrc%2Fplan.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtcGxhbi9zcmMvcGxhbi5ycw==) | `96.72% <100.00%> (+0.14%)` | :arrow_up: | | [src/daft-local-plan/src/translate.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180?src=pr&el=tree&filepath=src%2Fdaft-local-plan%2Fsrc%2Ftranslate.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtcGxhbi9zcmMvdHJhbnNsYXRlLnJz) | `94.15% <100.00%> (+0.24%)` | :arrow_up: | | [src/daft-micropartition/src/python.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180?src=pr&el=tree&filepath=src%2Fdaft-micropartition%2Fsrc%2Fpython.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbWljcm9wYXJ0aXRpb24vc3JjL3B5dGhvbi5ycw==) | `78.32% <100.00%> (+0.04%)` | :arrow_up: | | [...execution/src/sinks/monotonically\_increasing\_id.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fsinks%2Fmonotonically_increasing_id.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9zaW5rcy9tb25vdG9uaWNhbGx5X2luY3JlYXNpbmdfaWQucnM=) | `41.10% <77.01%> (ø)` | | ... and [39 files with indirect coverage changes](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3180/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)

🚨 Try these New Features: