Open TomAugspurger opened 4 years ago
Is there a dataset or workflow on which it makes sense to perform this benchmark?
I'm more than happy to walk through performance profile information with anyone doing this work.
Also, as an output of this I'd love to see a blogpost
It's good to see more work on Dask-ML's model selection! I have some ideas I like to see implemented, and would love to see benchmarks on modifications (e.g., to resolve #532).
Is there a dataset or workflow on which it makes sense to perform this benchmark?
I have a benchmark at https://github.com/stsievert/dask-hyperband-comparison. This benchmark focuses on heavy computation, not large datasets.
I'm also curious if @dankerrigan has applications from within his workplace that would be both relevant and open (my guess is that this is hard, but it's worth asking :) )
import data_ml.datesets
df = dask_ml.datasets.make_classification_df(
n_samples=1_000_000,
n_features=1000,
random_state=123,
chunks=100,
dates=(date(2019, 1, 1), date(2020, 1, 1)))
@mrocklin I'll see what I can do!
@dankerrigan I found clean separation of the model fitting and the searching to be useful. This allowed for quick iteration: I could make a small change than quickly see performance differences. The searches I ran simulated model fitting and recorded scores, and so they didn't require any data. This meant I could re-run the simulations without the same CPU or memory requirements.
My process looked something like this:
partial_fit
calls for each score.ReplayModel
ReplayModel
would read in one model's scores/number of partial_fit
calls and simulate computation by sleeping for a certain amount of time.The implementation of ReplayModel
in Simulate-Run.ipynb is pretty vanilla and reads in a model history. IIRC, I got the history from IncrementalSearchCV.model_history_
.
The one exception is the amount of time to sleep partial_fit
and score
to simulate the required computation. I carefully choose values of 1 second and 1.5 seconds respectively. There are two facts behind these values:
score
with a dataset 3× larger than the dataset provided to partial_fit
.partial_fit
call will takes about 1.5–3× as long as single score
call with the same data. Good benchmarks on modern neural nets on GPUs are at https://github.com/soumith/convnet-benchmarks. (also see "On automatic differentiation" by Andreas Griewank for proof that flops(partial_fit) <= 5 flops(score)
).FYI I'm working a lot on improving the joblib/dask
integration these days. Among other things, I'm building a benchmark suite for the joblib
using the dask
backend for a variety of workloads and use-cases, including things like scikit-learn
cross validations, GridSearch etc.
So I'm very interested by this.
I'm very glad to hear that you're interested here. I'm also quite interested here to see how I can help. Would it make sense for a few of us to get together for a quick call? I would enjoy learning more about what you're up to.
On Wed, Apr 22, 2020 at 5:29 AM Pierre Glaser notifications@github.com wrote:
FYI I'm working a lot on improving the joblib/dask integration these days. Among other things, I'm building a benchmark suite for the joblib using the dask backend for a variety of workloads and use-cases, including things like scikit-learn cross validations, GridSearch etc. So I'm very interested by this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/643#issuecomment-617749288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTG5M5V74ZBAXPSFR6LRN3PLNANCNFSM4MH3XZSA .
Would it make sense for a few of us to get together for a quick call?
I think that's a great idea. Given the current circumstances, I'm pretty much available whenever during UTC daytime.
Yes, I would enjoy a call as well!
I think that I'm the western-most person who would be interested in this. My day starts around 2:30 UTC (7:30 US Pacific, 10:30 US Eastern). I suggest that if people are interested they click the Heart icon on this comment. I'll then send out an e-mail with some scheduling options.
Invitations sent. I focused on 7-10am US Pacific, 3-6 UTC
I’ll aim to join too. Have been looking at a couple of different HPO approaches with a focus on GPU options. Thanks!
2020-04-28, 9am US Pacific, 4pm UTC (thank you Europeans for staying late)
@dankerrigan my apologies but I chose a time that excluded you. It was that or exclude others that were singletons from an organization. Hopefully @mmccarty can represent your views a bit.
@andremoeller my apologies but I don't have your e-mail. Regardless, the link below should have the relevant information for you to join.
https://docs.google.com/document/d/1USYpqW-pq5kfDoumoVC5gdXhIeGyrDLJ8dVNie5gXdc/edit?usp=sharing
It'd be nice to have some benchmarks for how our different hyperparameter optimizers perform. There are a few comparisons that would be useful
GridSearchCV(Pipeline(...))
fordask_ml.model_selection.GridSearchCV
andsklearn.model_selection.GridSearchCV
. We'd expect Dask-ML's to perform better the more CV splits there are and the more parameters that are explored early on in the pipeline (https://github.com/dask/dask-ml/issues/141 has some discussion).The items in the for loop are executed on the Dask Cluster. There are some issues with the backend (https://github.com/joblib/joblib/issues/1020, https://github.com/joblib/joblib/issues/1025). Fixing those aren't in scope for this work, but we'd like to have benchmarks to understand the current performance and measure the speedup from fixing those.
cc @dankerrigan. This is more than enough work I think. If you're able to make progress on any of these (or other things you think are important) it'd be great.