Open solalatus opened 5 years ago
Hey, great idea! Dask support shouldn't be too difficult, and changing the internal data type from pandas to dask would allow easy data scaling, and support for both pandas and dask.
I'd be happy to support a PR if @solalatus or someone else has the time to put together code to replace pandas w/ dask internally, and provide support for both pandas and dask.
@bjherger Is Dask support still a work in progress? Any suggestions for what to do in the meantime to be able to use the features of keras-pandas with Pandas DFs that are considerably too big to fit into memory?
I am very fond of the "humane" approach of keras-pandas to get tabular data into DL models. As far as I see, in TF Pandas is still a poster-child. (No obvious reference to it in 2.0 docs yet.) So I believe, for large, bigger than memory datasets processed with Pandas like tools there is no good solution in sight from TF side. I was wondering, if keras-pandas could work with a Dask DF instead of a Pandas one, so as to be able to scale to bigger datasets? This maybe would make a good use case here: https://github.com/dask/dask-ml/issues/268 (Though not just Dask-ML, but more of a general Dask DF question...)
What are your thoughts?
Thanks for the input and for this great lib, I am spreading the word about it! :+1: