fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[FEATURE] Implement all partitioning strategies for Dask #500

Closed goodwanghan closed 11 months ago

goodwanghan commented 11 months ago

Currently, Dask partitioning implementation is very incomplete. It mostly relies on Dask's own partitioning logic, and it only supports even partitioning when there is no partition keys, and the approach is not scalable either. So we need to implement all partitioning algos: "hash", "rand" and "even" for cases with or without partition keys.