Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.39k stars 170 forks source link

[FEAT] Native Runner #3178

Closed colin-ho closed 3 weeks ago

colin-ho commented 3 weeks ago

Makes swordfish a top level runner, set_runner_native(). Additionally sets swordfish to be the default runner for development.

This PR also contains some bug fixes and test changes, of which I have left comments for.

Additionally, this PR refactors swordfish in two ways:

  1. Buffers scan tasks based on a num_parallel_tasks parameter, that takes into account any pushed down limits.
  2. Adds an is_err check on the sender in parts of the code where we have a while receiver.recv.await -> sender.send pattern, such that it breaks out of the loop if the sender is dropped. This is needed in cases when the consumer is done receiving data, such as in a Limit, or if the user is doing iter(df) and breaks out of the iter, which will cause receivers to be dropped. As such, the senders should recognize this and drop as well.
github-actions[bot] commented 3 weeks ago

πŸš€ Deployed on https://deploy-preview-3178--daft-docs-preview.netlify.app

codspeed-hq[bot] commented 3 weeks ago

CodSpeed Performance Report

Merging #3178 will degrade performances by 60.07%

Comparing colin/native-runner (485132b) with main (2b71ffb)

Summary

⚑ 3 improvements ❌ 1 regressions βœ… 13 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main colin/native-runner Change
⚑ test_count[1 Small File] 4.2 ms 3.4 ms +21.59%
⚑ test_count[100 Small Files] 129.6 ms 69.3 ms +87.03%
⚑ test_iter_rows_first_row[100 Small Files] 9,448.6 ms 268.9 ms Γ—35
❌ test_show[100 Small Files] 16.2 ms 40.6 ms -60.07%
codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 80.45455% with 43 lines in your changes missing coverage. Please review.

Project coverage is 78.37%. Comparing base (2b71ffb) to head (485132b). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
daft/context.py 40.00% 18 Missing :warning:
src/daft-local-execution/src/dispatcher.rs 40.00% 9 Missing :warning:
src/daft-local-execution/src/sources/empty_scan.rs 57.14% 3 Missing :warning:
src/daft-local-execution/src/sources/in_memory.rs 66.66% 3 Missing :warning:
daft/runners/native_runner.py 95.74% 2 Missing :warning:
...-execution/src/intermediate_ops/intermediate_op.rs 66.66% 2 Missing :warning:
...c/daft-local-execution/src/sinks/streaming_sink.rs 75.00% 2 Missing :warning:
daft/io/writer.py 0.00% 1 Missing :warning:
src/common/daft-config/src/lib.rs 0.00% 1 Missing :warning:
src/daft-local-execution/src/run.rs 66.66% 1 Missing :warning:
... and 1 more
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178/graphs/tree.svg?width=650&height=150&src=pr&token=J430QVFE89&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) ```diff @@ Coverage Diff @@ ## main #3178 +/- ## ========================================== - Coverage 79.13% 78.37% -0.77% ========================================== Files 640 641 +1 Lines 77983 78938 +955 ========================================== + Hits 61715 61868 +153 - Misses 16268 17070 +802 ``` | [Files with missing lines](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | Coverage Ξ” | | |---|---|---| | [daft/runners/pyrunner.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=daft%2Frunners%2Fpyrunner.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9ydW5uZXJzL3B5cnVubmVyLnB5) | `86.34% <100.00%> (-1.29%)` | :arrow_down: | | [daft/runners/ray\_runner.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=daft%2Frunners%2Fray_runner.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9ydW5uZXJzL3JheV9ydW5uZXIucHk=) | `81.01% <100.00%> (+0.06%)` | :arrow_up: | | [daft/runners/runner.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=daft%2Frunners%2Frunner.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9ydW5uZXJzL3J1bm5lci5weQ==) | `78.57% <100.00%> (+2.57%)` | :arrow_up: | | [src/daft-local-execution/src/pipeline.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fpipeline.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9waXBlbGluZS5ycw==) | `95.23% <100.00%> (+0.06%)` | :arrow_up: | | [src/daft-local-execution/src/sources/scan\_task.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Fsources%2Fscan_task.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9zb3VyY2VzL3NjYW5fdGFzay5ycw==) | `75.55% <100.00%> (+3.05%)` | :arrow_up: | | [src/daft-local-plan/src/plan.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=src%2Fdaft-local-plan%2Fsrc%2Fplan.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtcGxhbi9zcmMvcGxhbi5ycw==) | `96.38% <100.00%> (+0.02%)` | :arrow_up: | | [src/daft-local-plan/src/translate.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=src%2Fdaft-local-plan%2Fsrc%2Ftranslate.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtcGxhbi9zcmMvdHJhbnNsYXRlLnJz) | `93.63% <100.00%> (+0.04%)` | :arrow_up: | | [daft/io/writer.py](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=daft%2Fio%2Fwriter.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-ZGFmdC9pby93cml0ZXIucHk=) | `0.00% <0.00%> (ΓΈ)` | | | [src/common/daft-config/src/lib.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=src%2Fcommon%2Fdaft-config%2Fsrc%2Flib.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2NvbW1vbi9kYWZ0LWNvbmZpZy9zcmMvbGliLnJz) | `82.14% <0.00%> (-5.36%)` | :arrow_down: | | [src/daft-local-execution/src/run.rs](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree&filepath=src%2Fdaft-local-execution%2Fsrc%2Frun.rs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc#diff-c3JjL2RhZnQtbG9jYWwtZXhlY3V0aW9uL3NyYy9ydW4ucnM=) | `87.96% <66.66%> (-0.58%)` | :arrow_down: | | ... and [8 more](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc) | | ... and [6 files with indirect coverage changes](https://app.codecov.io/gh/Eventual-Inc/Daft/pull/3178/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Eventual-Inc)
samster25 commented 3 weeks ago

@colin-ho looks like this introduces a regression for the benchmark test_show[100 Small Files] Can you take a look?