How do you minimize/optimize data movement?

TimelyDataflow / timely-dataflow

A modular implementation of timely dataflow in Rust

MIT License

3.29k stars 272 forks source link

You are able to use exchange to place the data however you like, which sounds like it will probably be by bar. With the data distributed by bar, you now have the guarantee that (foo, bar) keys are also at the same workers, and so operations that need to group by (foo, bar) will be correct if executed without further data exchange.

Now, there's nothing in timely that protects you from first exchanging by bar, then calling some method that re-exchanges things by (foo, bar); perhaps you actually want to re-shuffle the data to balance the load; we don't know and aren't going to prevent it. So to get this to do the thing you want, you'll need to be careful to avoid re-shuffling the data, and that might be tricky if you are hoping to rely on certain pre-fab timely operators; they generally do not have variants that say "trust me; just run w/o any data exchange".

TimelyDataflow / timely-dataflow

How do you minimize/optimize data movement? #160