fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[FEATURE] Simplify zip/comap, remove join from the implementation. #501

Closed goodwanghan closed 11 months ago

goodwanghan commented 11 months ago

Currently zip and comap logic is convoluted. It also relies on join operation that is not well supported by Ray. So we need to use pure map and group map operations to realize this operation.

Actually zip and comap should always stay together, so in the long run we may need a breaking change to merge these two functions. If we can merge we may also get rid of the need of dataframe metadata, which is extremely hard to maintain.

But as the first step, we will clean up the logic and remove joins first.