alteryx / featuretools

An open source python library for automated feature engineering
https://www.featuretools.com
BSD 3-Clause "New" or "Revised" License
7.28k stars 878 forks source link

Remove support for Dask and Spark dataframes #2704

Closed thehomebrewnerd closed 7 months ago

thehomebrewnerd commented 7 months ago

Maintenance of support for creating EntitySets from Dask and Spark dataframes has been problematic as breaking changes are introduced in newer versions of dask and pyspark. Additionally, implementation for those dataframe types are still incomplete relative to the pandas implementations, with several features and many primitives still unsupported after several years. We should remove support for these dataframe types and focus Featuretools on what it does best, generating features for pandas dataframes.

We should keep the ability to do parallel calculation of a feature matrix using Dask, however.

AlpAribal commented 7 months ago

Would you consider having an ibis backend? It in return supports 20+ backends (including Spark and Dask). #1913 would be solved for free as well.

thehomebrewnerd commented 7 months ago

@AlpAribal It would be great to find a way to more easily support different dataframe types in Featuretools, especially if it would ease the maintenance burden inside Featuretools. I think that probably requires a much deeper study before we launch into that effort though. In any case, thank you for the suggestion and idea!