fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[FEATURE] `as_dicts` for DataFrames #521

Closed goodwanghan closed 8 months ago

goodwanghan commented 8 months ago

Is your feature request related to a problem? Please describe. Now we only has as_dict_iterable, but if we want to get an iterable, the execution behavior can be very different. Spark is the best example, when there are a lot of partitions (thousands of), as_dict_iterable can be very slow because it will go through partition by partition.

Describe the solution you'd like as_dicts means getting dicts as a whole, with this semantic, we could collect all data in parallel, and then convert to dicts.