fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[BUG] Pandas 2+ and Spark < 3.4 can't work together #476

Closed goodwanghan closed 1 year ago

goodwanghan commented 1 year ago

Pandas 2+ and Spark < 3.4 can't work together because pyspark < 3.4 relies on some old features of Pandas such as iteritems.

However, there are workarounds. And given that many users have not upgraded to the latest Spark but still want to use Pandas 2, we should implement the workarounds and make them work together.

We should also take this chance to do more extensive Spark tests with different version combinations of pyspark and pandas.