Closed colin-ho closed 1 week ago
Comparing colin/probe-state-bridge
(e751f61) with main
(711e862)
⚡ 2
improvements
✅ 15
untouched benchmarks
Benchmark | main |
colin/probe-state-bridge |
Change | |
---|---|---|---|---|
⚡ | test_iter_rows_first_row[100 Small Files] |
269 ms | 237.4 ms | +13.32% |
⚡ | test_show[100 Small Files] |
22.3 ms | 15.3 ms | +45.45% |
Attention: Patch coverage is 97.11316%
with 25 lines
in your changes missing coverage. Please review.
Project coverage is 77.59%. Comparing base (
711e862
) to head (e751f61
). Report is 10 commits behind head on main.
There's a couple outstanding issues / inefficiencies / ugliness in the swordfish code. I originally intended of breaking these up into smaller PRs, but during the process of trying to split it up, I realized that all the changes are quite intertwined, and it may be easier on the reviewer to just see all of them in one. That being said, I'll try my best to explain all the changes and rationale in detail.
Problems
PipelineChannel
abstraction doesn't play well with streaming sinks, in fact, in only really works for intermediate ops. This is because thePipelineChannel
assumes sending/receiving is round robin, but streaming sinks can send data from the workers, as well as after the finalize.execute
orfinalize
methods. These don't need to be spawned on the compute runtime.Proposed Changes
Receiver<Arc<Micropartition>>
.ProbeStateBridge
.loole
channels, which are multi-producer multi-consumer. For consistency, useloole
channels across all swordfish. In the future we can think about implementing our custom channels. https://users.rust-lang.org/t/loole-a-safe-async-sync-multi-producer-multi-consumer-channel/113785