dask / dask-searchcv

dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
BSD 3-Clause "New" or "Revised" License
240 stars 43 forks source link

Online fit - WIP #45

Open thomasgreg opened 7 years ago

thomasgreg commented 7 years ago

This is an attempt at issue #32.

The following WIP: - removes TokenIterator and main_token which are dependent on parameters and their ordering - constructs tokens for the fit names based on parameters uniquely without depending on a mapping; the dask graph is queried directly for previously encountered tasks

The current approach in part evolved out of becoming familiar with the assumptions of the existing codebase so I ended up being strict about keys and defensive in graph updates (see update_dsk). Passing around and managing a global seen mapping with dsk may achieve the same effect with minimal code change.

Have commited a simpler solution which avoids a major refactoring

Todo:

jcrist commented 7 years ago

Apologies for letting this sit so long - I'll try to give it a good review later today or sometime this weekend. Thanks for taking on this issue :).

thomasgreg commented 7 years ago

No worries :) .. just found a bug so working on that and cleaning the example

TomAugspurger commented 6 years ago

Apologies for letting this linger @thomasgreg. We're moving further development of dask-searchcv into https://github.com/dask/dask-ml

https://github.com/dask/dask-ml/pull/221 is implementing Hyperband. If you're interested in picking this up again, we could maybe reuse some components / structure from there. LMK if you want help with rebasing this on top of dask-ml.

stsievert commented 6 years ago

I'm not sure how much you'll be able to reuse from https://github.com/dask/dask-ml/pull/221 – most the framework there is with _partial_fit_and_score, not with the adaptive framework spelled out in https://github.com/scikit-learn/scikit-learn/pull/9599