dask / dask-searchcv

dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
BSD 3-Clause "New" or "Revised" License
240 stars 43 forks source link

dask-searchcv incompatible with Dask v0.18 #76

Closed jtromans closed 6 years ago

jtromans commented 6 years ago

Trying to run dask-searchcv using version 0.2.0 with Dask 0.18.1 and I'm finding that the scheduling back-end 'distributed' is no longer picking up the tasks, and instead they run from the submitting Python kernel instead of the registered worker. The web-ui confirms as much since the tasks fail to show, and instead I see console output in the submitting kernel, as opposed to the dask-worker.

By downgrading to Dask 0.17.5, this issue is resolved.

TomAugspurger commented 6 years ago

Can you provide a reproducible example? This works for me

In [5]: from distributed import Client

In [6]: client = Client()

In [7]: from sklearn.datasets import load_digits
   ...: from sklearn.svm import SVC
   ...:
   ...: # Fit with dask-searchcv
   ...: from dask_searchcv import GridSearchCV
   ...:
   ...: param_space = {'C': [1e-4, 1, 1e4],
   ...:                'gamma': [1e-3, 1, 1e3],
   ...:                'class_weight': [None, 'balanced']}
   ...:
   ...: model = SVC(kernel='rbf')
   ...:
   ...: digits = load_digits()
   ...:
   ...: search = GridSearchCV(model, param_space, cv=3)
   ...: search.fit(digits.data, digits.target)
jtromans commented 6 years ago

Hi @TomAugspurger . I'll try this later this evening. However, I note that you aren't providing a remote scheduler IP address in this example. Perhaps that doesn't matter, but I'll confirm.

TomAugspurger commented 6 years ago

In theory, it shouldn't.

If you are connecting to a remote scheduler, I would verify that your versions of dask, distributed, and dask-searchcv match on all the machines (see client.get_versions).

On Mon, Jun 25, 2018 at 7:51 AM, jtromans notifications@github.com wrote:

Hi @TomAugspurger https://github.com/TomAugspurger . I'll try this later this evening. However, I note that you aren't providing a remote scheduler IP address in this example. Perhaps that doesn't matter, but I'll confirm.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-searchcv/issues/76#issuecomment-399940577, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIvQ7VBqt2kCbJ8RDh8qs-JxhZBN7ks5uANzQgaJpZM4U01mW .

jtromans commented 6 years ago

Example of v0.17.5 working correctly: https://www.youtube.com/watch?v=TD9elHu5ag4

Example of v0.18.1 working incorrectly: https://www.youtube.com/watch?v=ZXxZ9NvV8C0

@TomAugspurger please let me know if I can help to provide additional information.

TomAugspurger commented 6 years ago

Closing this in favor of https://github.com/dask/dask-ml/issues/249

TomAugspurger commented 6 years ago

This was fixed over in https://github.com/dask/dask-ml/pull/260

It'll be included in the next release of dask-ml which is probably sometime in the next week or two.

jtromans commented 6 years ago

Thanks