apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.24k stars 3.47k forks source link

[Rust] [DataFusion] Add support for eliminating hash repartition #28227

Open asfimport opened 3 years ago

asfimport commented 3 years ago

If the intermediate data is already partitioned on a certain expression (key), the repartition doesn't have to be added (in a join), or should be removed in an optimization rule. This will avoid having to repartition (and maybe shuffle in Ballista).

Reporter: Daniël Heres / @Dandandan

Note: This issue was originally created as ARROW-12439. Please see the migration documentation for further details.

asfimport commented 2 years ago

Joris Van den Bossche / @jorisvandenbossche: Should this be moved to https://github.com/apache/arrow-rs/ or https://github.com/apache/arrow-datafusion?