Open thomasgreg opened 7 years ago
Apologies for letting this sit so long - I'll try to give it a good review later today or sometime this weekend. Thanks for taking on this issue :).
No worries :) .. just found a bug so working on that and cleaning the example
Apologies for letting this linger @thomasgreg. We're moving further development of dask-searchcv into https://github.com/dask/dask-ml
https://github.com/dask/dask-ml/pull/221 is implementing Hyperband. If you're interested in picking this up again, we could maybe reuse some components / structure from there. LMK if you want help with rebasing this on top of dask-ml.
I'm not sure how much you'll be able to reuse from https://github.com/dask/dask-ml/pull/221 – most the framework there is with _partial_fit_and_score
, not with the adaptive framework spelled out in https://github.com/scikit-learn/scikit-learn/pull/9599
This is an attempt at issue #32.
The following WIP:- removes TokenIterator and main_token which are dependent on parameters and their ordering- constructs tokens for the fit names based on parameters uniquely without depending on a mapping; the dask graph is queried directly for previously encountered tasksThe current approach in part evolved out of becoming familiar with the assumptions of the existing codebase so I ended up being strict about keys and defensive in graph updates (seeupdate_dsk
). Passing around and managing a globalseen
mapping withdsk
may achieve the same effect with minimal code change.Have commited a simpler solution which avoids a major refactoring
Todo:
DaskBaseSearchCV
instead of an unwieldy functiondsk
is updated directly instead of usingseen
)