dask / dask-searchcv

dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
BSD 3-Clause "New" or "Revised" License
240 stars 43 forks source link

Add `scheduler` parameter #44

Closed jcrist closed 7 years ago

jcrist commented 7 years ago

Allow specifying the scheduler by name instead of passing in the get function directly. Scheduler can be one of:


Not fully set on this yet.

Pros:

Cons:

If we go this route, then I'd also add n_jobs as a parameter (matching scikit-learn), which would specify num_workers for the threading and multiprocessing schedulers, and be ignored by the others. Might also make n_jobs=1 for all but distributed result in the synchronous scheduler. Downside of supporting n_jobs here is we'd probably want the default to match what dask does (n_jobs = cpu_count()) instead of what scikit-learn does (n_jobs=1). I'm fine with this, but it is a difference.

If we don't go this route, then I might add a scheduler_kwargs parameter instead, which would be forwarded to the get call. Not sure if any of the other keyword arguments would prove useful for this library though.

mrocklin commented 7 years ago

We might also consider adopting a convention like this within Dask. The term get is a pretty bad name. It actually comes all the way from the first commit

jcrist commented 7 years ago

Ok, I've cleaned this up and added support for n_jobs, which has been asked for in the past.

A few potential issues:

mrocklin commented 7 years ago

I think that following joblib over dask names (like threading over threaded) makes sense for this interface. You could also support aliases so that all of sync, sequential, synchronous would map to the same value.

Actually, I like this idea because then we can establish these names in more parts of Dask where we might prefer threaded over threading.

scheduler_types = {
  'sync': 'synchronous',
  'sequential': 'synchronous',  # from joblib
  'threading': 'threaded',
  ...
}

scheduler = scheduler_types.get(scheduler, scheduler)
mrocklin commented 7 years ago

I would like to give people the option to provide names rather than get= functions.

jcrist commented 7 years ago

I liked the alias idea and added it. I think this is good to go. Merging.