bjherger / keras-pandas

keras-pandas allows users to rapidly build and iterate on deep learning models.
MIT License
57 stars 14 forks source link

Could keras-pandas work with Dask? #107

Open solalatus opened 5 years ago

solalatus commented 5 years ago

I am very fond of the "humane" approach of keras-pandas to get tabular data into DL models. As far as I see, in TF Pandas is still a poster-child. (No obvious reference to it in 2.0 docs yet.) So I believe, for large, bigger than memory datasets processed with Pandas like tools there is no good solution in sight from TF side. I was wondering, if keras-pandas could work with a Dask DF instead of a Pandas one, so as to be able to scale to bigger datasets? This maybe would make a good use case here: https://github.com/dask/dask-ml/issues/268 (Though not just Dask-ML, but more of a general Dask DF question...)

What are your thoughts?

Thanks for the input and for this great lib, I am spreading the word about it! :+1:

bjherger commented 5 years ago

Hey, great idea! Dask support shouldn't be too difficult, and changing the internal data type from pandas to dask would allow easy data scaling, and support for both pandas and dask.

I'd be happy to support a PR if @solalatus or someone else has the time to put together code to replace pandas w/ dask internally, and provide support for both pandas and dask.

Sammi-Smith commented 4 years ago

@bjherger Is Dask support still a work in progress? Any suggestions for what to do in the meantime to be able to use the features of keras-pandas with Pandas DFs that are considerably too big to fit into memory?