Open TomAugspurger opened 6 years ago
I was thinking more about searching over pipelines with dasksearchcv because that's a plug-in replacement that is just more efficient. But we could also think about large grid-searches over a cluster. That's definitely also useful. I'm just a bit worried about time. We generally have too much material already, but I haven't had time to actually go through the material again for this year (I have some more talks / tutorials before this one)
just mentioning that as a joblib backend is definitely possible, though.
Yes, time is unfortunately tight, especially in introductory tutorials :/
On Sun, Apr 22, 2018 at 9:07 AM, Andreas Mueller notifications@github.com wrote:
just mentioning that as a joblib backend is definitely possible, though.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dask/scipy-tutorials-2018/issues/5#issuecomment-383384115, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIgw3vWm-NBaZUb1HfRgKUXu7JHkzks5trI6zgaJpZM4Tdtd2 .
Splitting from https://github.com/dask/scipy-tutorials-2018/issues/3
cc @amueller
I think the distributed joblib is the best value in terms of additional teaching time / usefulness. Teaching-wise it's "We've seen
n_jobs=-1
mean all the cores on a single machine. With this context manager, nown_jobs=-1
now means all the core on a cluster!"RandomForest (and some others) hardcode the joblib backend to use threading. After the next joblib release, I plan to open issues on scikit-learn to
A good example for now might be a large grid search. Something like a bigger version of the first example in https://mybinder.org/v2/gh/dask/dask-examples/master?filepath=machine-learning.ipynb