fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[FEATURE] Use `_collect_as_arrow` for `fugue_api.as_arrow(spark_df)` #516

Closed ion-elgreco closed 9 months ago

ion-elgreco commented 9 months ago

Is your feature request related to a problem? Please describe. Convert spark df to arrow can be done with private method inside pyspark: _collect_as_arrow https://github.com/apache/spark/blob/06ccb6d434476afacc08936cf473670102d41010/python/pyspark/sql/pandas/conversion.py#L244

goodwanghan commented 9 months ago

This is a great idea, will release a dev version

goodwanghan commented 9 months ago

@ion-elgreco please try 0.8.7.dev5, it uses _collect_as_arrow

ion-elgreco commented 9 months ago

@ion-elgreco please try 0.8.7.dev5, it uses _collect_as_arrow

Nice, I'll try it out later this week!