Closed AlexSchuy closed 6 years ago
Can you provide a simple example that shows the failure that you're experiencing?
On Sat, Jan 6, 2018 at 2:03 PM, Alex Schuy notifications@github.com wrote:
The Keras neural-network package has a sklearn wrapper that works with the sklearn RandomizedSearchCV and GridSearchCV classes. However, it fails with the dask-searchcv equivalents. Thus, there seem to be additional requirements beyond the sklearn estimator interface that must be met in order for dask-searchcv to work. Would it be possible to list these, such that other projects could be adapted to be used with dask-searchcv?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-searchcv/issues/69, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszN8hIRx5hCChfcVguFcoIjziMWv1ks5tH9GVgaJpZM4RVbBf .
The following code using dask_searchcv.GridSearchCV crashes, but if you comment-out the dask_searchcv import and uncomment the sklearn import, it runs.
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from dask_searchcv import GridSearchCV
#from sklearn.model_selection import GridSearchCV
def simple_nn(hidden_neurons):
model = Sequential()
model.add(Dense(hidden_neurons, activation='relu', input_dim=30))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model
param_grid = {'hidden_neurons': [100, 200, 300]}
cv = GridSearchCV(KerasClassifier(build_fn=simple_nn), param_grid)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
cv.fit(X_train, y_train)
score = cv.score(X_test, y_test)
print('score = {} on train set with params={}.'.format(score, cv.best_params_))
Thanks for the example. Could I ask you for one more thing and include the full traceback and exception that you get when it fails?
On Sat, Jan 6, 2018 at 2:38 PM, Alex Schuy notifications@github.com wrote:
The following code using dask_searchcv.GridSearchCV crashes, but if you comment-out the dask_searchcv import and uncomment the sklearn import, it runs.
from keras.models import Sequential from keras.layers import Dense from keras.wrappers.scikit_learn import KerasClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split
from dask_searchcv import GridSearchCV
from sklearn.model_selection import GridSearchCV
def simple_nn(hidden_neurons): model = Sequential() model.add(Dense(hidden_neurons, activation='relu', input_dim=30)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy']) return model
param_grid = {'hidden_neurons': [100, 200, 300]} cv = GridSearchCV(KerasClassifier(build_fn=simple_nn), param_grid) X, y = load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y) cv.fit(X_train, y_train) score = cv.score(X_test, y_test) print('score = {} on train set with params={}.'.format(score, `cv.bestparams))
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-searchcv/issues/69#issuecomment-355774645, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGLB4s4xEzhH8v2xQuwyQeZdNwXAks5tH9nAgaJpZM4RVbBf .
File "kerasexample.py", line 21, in
I believe that TensorFlow may have an issue where it doesn't like running in multiple Python threads. cc @bnaul who has dealt with this before. You might also want to do a web search on TensorFlow, Python, and Threads.
To answer your original question. To use Dask with the multi-threading scheduler your code should be able to be run in multiple threads (most code is, just not TensorFlow). To use Dask with the multiprocessing or distributed schedulers your code should be able to be serialized (most code is). You can always use Dask with the single-threaded scheduler if you want to see how things work out. To try this run the following line:
dask.set_options(get=dask.local.get_sync)
The Keras neural-network package has a sklearn wrapper that works with the sklearn RandomizedSearchCV and GridSearchCV classes. However, it fails with the dask-searchcv equivalents. Thus, there seem to be additional requirements beyond the sklearn estimator interface that must be met in order for dask-searchcv to work. Would it be possible to list these, such that other projects could be adapted to be used with dask-searchcv?