Open topepo opened 3 years ago
Hey, Max, glad to see you here. I was writing about forking and then I decided to perform a benchmark to enrich the vignette. I was expecting to corroborate your findings but I ended up with counter-intuitive results.
tl;dr: pure forking or pure threading wasn't the best: 2 threads with 4 workers was the fastest setup.
see here https://curso-r.github.io/treesnip/articles/threading-forking-benchmark.html
Do you think that it is worth it to consider these combinations? Or is it better to stick with the simple rule of thumb (tune -> forking; fit -> thread)?
That's really interesting! TBH I',m surprised that a combination like that works at all. Can you make a plot of the x-axis as the speed-up (seq time/par time)?
I might run some of these too locally this weekend.
@topepo I'm running more benchmarks here and I think I spotted a potential issue you might want to check yourself to confirm: when I set vfold_cv(v = 3) only 3 workers were used even with tune_grid() set to fit lots of different models. And when I set to vfold_cv(v = 8) I watched all my 8 cores 100%. My hypothesis is that tune_grid() is forking only on the folds loop.
Hi,
I'm using doFuture/doRNG parallel processing for my tidymodels workflows (for tuning), with other engines (apparently I need to load doFuture
before using doRNG
, but I'm still trying to check that):
library(doFuture)
registerDoFuture()
plan(multisession)
doRNG::registerDoRNG()
It fails when using treesnip with catboost. I get an error: Error in pkg_list[[1]]: subscript out of bounds
.
This is because catboost and treesnip are not loaded on the workers (I can't fork because of Rstudio, and there is a consensus you shouldn't fork from Rstudio).
It works when I "register" the dependencies manually (see https://github.com/tidymodels/tune/issues/205):
set_dependency("boost_tree", eng = "catboost", "catboost")
set_dependency("boost_tree", eng = "catboost", "treesnip")
It could useful to either document that somewhere for people or maybe there is a place where you can include the set_dependency commands.
I would suggest that, when using
tune
, the standardforeach
parallelism be suggested and the model-specific threading methods be used if justparsnip
is being used to fit.Generally, parallelizing the resamples is faster than the individual models (see
xgboost
example). We always try to parallelize the longest running loop.