apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.24k stars 3.47k forks source link

[C++][Compute] GroupBy: add parallelism to hash group by #28468

Open asfimport opened 3 years ago

asfimport commented 3 years ago

Implement parallel processing for hash group by.  Make sure it works well (scales) for different cardinalities of groups.

Reporter: Michal Nowakiewicz / @michalursa

Related issues:

Note: This issue was originally created as ARROW-12726. Please see the migration documentation for further details.

asfimport commented 2 years ago

Michal Nowakiewicz / @michalursa: Parallelism is implemented in hash group by. What remains is to benchmark to see how well it behaves for a set of varying dimensions, fix performance issues, investigate improvements for different cardinalities of groups, implement the improvements.