dask / dask-searchcv

dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
BSD 3-Clause "New" or "Revised" License
240 stars 43 forks source link

Incompatibility With Keras Scikit-Learn Wrapper #69

Closed AlexSchuy closed 6 years ago

AlexSchuy commented 6 years ago

The Keras neural-network package has a sklearn wrapper that works with the sklearn RandomizedSearchCV and GridSearchCV classes. However, it fails with the dask-searchcv equivalents. Thus, there seem to be additional requirements beyond the sklearn estimator interface that must be met in order for dask-searchcv to work. Would it be possible to list these, such that other projects could be adapted to be used with dask-searchcv?

mrocklin commented 6 years ago

Can you provide a simple example that shows the failure that you're experiencing?

On Sat, Jan 6, 2018 at 2:03 PM, Alex Schuy notifications@github.com wrote:

The Keras neural-network package has a sklearn wrapper that works with the sklearn RandomizedSearchCV and GridSearchCV classes. However, it fails with the dask-searchcv equivalents. Thus, there seem to be additional requirements beyond the sklearn estimator interface that must be met in order for dask-searchcv to work. Would it be possible to list these, such that other projects could be adapted to be used with dask-searchcv?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-searchcv/issues/69, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszN8hIRx5hCChfcVguFcoIjziMWv1ks5tH9GVgaJpZM4RVbBf .

AlexSchuy commented 6 years ago

The following code using dask_searchcv.GridSearchCV crashes, but if you comment-out the dask_searchcv import and uncomment the sklearn import, it runs.

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

from dask_searchcv import GridSearchCV
#from sklearn.model_selection import GridSearchCV

def simple_nn(hidden_neurons):
  model = Sequential()
  model.add(Dense(hidden_neurons, activation='relu', input_dim=30))
  model.add(Dense(1, activation='sigmoid'))
  model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
  return model

param_grid = {'hidden_neurons': [100, 200, 300]}
cv = GridSearchCV(KerasClassifier(build_fn=simple_nn), param_grid)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
cv.fit(X_train, y_train)
score = cv.score(X_test, y_test)
print('score = {} on train set with params={}.'.format(score, cv.best_params_))
mrocklin commented 6 years ago

Thanks for the example. Could I ask you for one more thing and include the full traceback and exception that you get when it fails?

On Sat, Jan 6, 2018 at 2:38 PM, Alex Schuy notifications@github.com wrote:

The following code using dask_searchcv.GridSearchCV crashes, but if you comment-out the dask_searchcv import and uncomment the sklearn import, it runs.

from keras.models import Sequential from keras.layers import Dense from keras.wrappers.scikit_learn import KerasClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split

from dask_searchcv import GridSearchCV

from sklearn.model_selection import GridSearchCV

def simple_nn(hidden_neurons): model = Sequential() model.add(Dense(hidden_neurons, activation='relu', input_dim=30)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) return model

param_grid = {'hidden_neurons': [100, 200, 300]} cv = GridSearchCV(KerasClassifier(build_fn=simple_nn), param_grid) X, y = load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y) cv.fit(X_train, y_train) score = cv.score(X_test, y_test) print('score = {} on train set with params={}.'.format(score, `cv.bestparams))

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-searchcv/issues/69#issuecomment-355774645, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGLB4s4xEzhH8v2xQuwyQeZdNwXAks5tH9nAgaJpZM4RVbBf .

AlexSchuy commented 6 years ago

File "kerasexample.py", line 21, in cv.fit(X_train, y_train) File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask_searchcv/model_selection.py", line 867, in fit out = scheduler(dsk, keys, num_workers=n_jobs) File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/threaded.py", line 75, in get pack_exception=pack_exception, *kwargs) File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/local.py", line 521, in get_async raise_exception(exc, tb) File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/local.py", line 290, in execute_task result = _execute_task(task, data) File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask/local.py", line 271, in _execute_task return func(args2) File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask_searchcv/methods.py", line 280, in fit_and_score fields, params, fit_params) File "/phys/users/schuya/.local/lib/python2.7/site-packages/dask_searchcv/methods.py", line 216, in fit est.fit(X, y, fit_params) File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/wrappers/scikit_learn.py", line 203, in fit return super(KerasClassifier, self).fit(x, y, kwargs) File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/wrappers/scikit_learn.py", line 147, in fit history = self.model.fit(x, y, fit_args) File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/models.py", line 960, in fit validation_steps=validation_steps) File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1657, in fit validation_steps=validation_steps) File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1213, in _fit_loop outs = f(ins_batch) File "/phys/users/schuya/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2357, in call self.session_kwargs) File "/phys/users/schuya/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/phys/users/schuya/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1067, in _run

mrocklin commented 6 years ago

I believe that TensorFlow may have an issue where it doesn't like running in multiple Python threads. cc @bnaul who has dealt with this before. You might also want to do a web search on TensorFlow, Python, and Threads.

mrocklin commented 6 years ago

To answer your original question. To use Dask with the multi-threading scheduler your code should be able to be run in multiple threads (most code is, just not TensorFlow). To use Dask with the multiprocessing or distributed schedulers your code should be able to be serialized (most code is). You can always use Dask with the single-threaded scheduler if you want to see how things work out. To try this run the following line:

dask.set_options(get=dask.local.get_sync)