adriangb / scikeras

Scikit-Learn API wrapper for Keras.
https://www.adriangb.com/scikeras/
MIT License
239 stars 47 forks source link

Multidimensional data inputs #303

Open azagajewski opened 1 year ago

azagajewski commented 1 year ago

I've been trying to train a Convolutional Neural Network on a large, out of memory dataset, using Scikeras as a bridge between Keras and Dask. From some experiments, it seems KerasRegressor enforces the scikit-learn expectation that data be a 2D array of (n_samples,n_features).

Is there a way to overcome the limitation? Obviously reshaping image inputs to fit the API is not ideal.

adriangb commented 1 year ago

We support doing transformations like taking a single 2D input and turning it into a multi arbitrary TensorFlow inputs: https://adriangb.com/scikeras/stable/notebooks/DataTransformers.html#4.-Multiple-inputs, but this preserves the behavior of "Scikit-Learn API on the outside, TensorFlow APIs on the inside".

What you're asking for is non Scikit-Learn APIs on the outside of SciKeras, which is something we don't support and don't plan on supporting. While I understand why you want the feature (and you're not the first one to ask for it) the fact of the matter is that supporting that opens up a whole can of worms with respect to what SciKeras should and shouldn't support that I think runs the risk of making everything else more complicated by turning SciKeras into a lot more than a shim between Scikit-Learn and TensorFlow. Considering I receive $0 in donations for this work right now and ML is not the main focus of my current day job I don't want to go down that path.