Closed Dekermanjian closed 4 years ago
Hi thank you for the quick response. I read through the vignette, but I still can't seem to get it to work. First I want to make sure it is clear that the future_lapply's are not nested inside of the function. One future_lapply runs, then the next future_lapply will run. I tried plan(list(multisession, multisession))
it still ran the first in parallel and the second sequentially.
Read section 'Built-in protection against recursive parallelism'
Okay, I read through the section. But I am still not sure if my case is a nested case. For example if I run the 2 future_lapplys outside of the function then they work as expected. So if function(x){ plan(multisession) future_lapply(code here), # Only this runs in parallel future_lapply(code here) }
was plan(multisession) future_lapply(do stuff here) then future_lapply(do stuff here)
both future_lapply's run in parallel, which is what I expect to happen inside of the function, but it doesn't, I tried multiple combinations of plan(list(multisession, sequential)) this runs the first in parallel and the second sequential or plan(list(sequential, multisession)) this runs both future_lapply's sequentially. There is no combination that will run the two future_lapply's, that are not nested, in parallel. Again one future_lapply in parallel then the other future_lapply in parallel, not nested.
A couple of comments:
You should avoid calling plan()
inside functions, cf. help("plan")
. The philosophy is that you should leave that decision to the end-user.
plan(list(multisession, multisession))
is explained in the vignette which explains that it is effectively becomes plan(list(multisession, sequential))
. This is to avoid frying the user's computer. If you have 8 cores and this was not protected against, it would run 8x8=64 parallel workers, which would overload your computer. If your machine has 64 cores, you'd end up with 4,096 parallel workers - that would bring it to a full stall.
That paragraph in the vignette continues showing that you can override this by explicitly specifying the number of workers you want in each level. So, you can specify:
plan(list(tweak(multisession, workers = 64)), tweak(multisession, workers = 64)))
but it's not a good idea. To make sure you only run 64 parallel workers at the same time, you can split the 64 cores up between the two layers. A few examples,
plan(list(tweak(multisession, workers = 2)), tweak(multisession, workers = 32)))
plan(list(tweak(multisession, workers = 4)), tweak(multisession, workers = 16)))
plan(list(tweak(multisession, workers = 8)), tweak(multisession, workers = 8)))
plan(list(tweak(multisession, workers = 16)), tweak(multisession, workers = 4)))
plan(list(tweak(multisession, workers = 32)), tweak(multisession, workers = 2)))
Which combo to pick depends on what's done.
Thank you for your help. I had tried plan(list(tweak(multisession, workers = 12)), tweak(multisession, workers = 12)))
but I tried it from within the function. I will try specifying the plan outside of the function instead. But I am still unsure about the values, for my case I have 6 cores 12 threads, would I need a multiple of 6*6 = 36? That means instead of 12 and 12 for workers I would need something like 2 and 18? How do the splits of the two workers work?
If you have N cores on your machine to play with, when splitting it up in (N1, N2), you should make sure that N1*N2 <= N in order to not overuse your cores.
FYI, availableCores()
will give you what N is.
Thank you. I appreciate all the help.
You're welcome.
I have written a function that uses multiple future_lapply's, however only the first one is running in parallel.
function(x){ plan(multisession) future_lapply(code here), # Only this runs in parallel future_lapply(code here) }
I assume there is some argument I am missing for this to work inside of a function. Any advice is greatly appreciated.