HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

The effective number of parallel jobs decreases during the computation when the total number of jobs is larger than number of cores #71

Closed zhengchencai closed 2 years ago

zhengchencai commented 2 years ago

Hello,

Sorry for asking the same question, just in case it is more doFuture related. I am fitting n= 1000 Stan models with 64 cores using doFuture package. Below is my code,

plan(multisession)
  fit_all <- foreach(
    ifit = fit_idx # length(fit_idx) = 1000
  ) %dorng% {
    mod$sample(
      data = data_list[[ifit]], iter_warmup = 1000, iter_sampling = 1000,
      chains = 4, parallel_chains = 4, show_messages = F, output_dir = save_csv
    )
  }

So each model fitting will use 4 cores/threads in parallel, with 128 threads I should have 32 models fitting at the same time which is true at the beginning of the computation, all 64 cores (128 threads) were indeed used. I am expecting if one fit is done (4 threads become available), another model fitting will start and all CPUs should have been used all the time until the last few models. Then the # active CPUs will decrease until finishing. However, it seems the TOTAL # of working CPUs was decreasing little by little, around the middle of the computation, there were just ~ 32 cores working, this could decrease until only 4 cores are working and the rest models will be computed one after the other which increased the total computation time a lot. In other words, the effective # of parallel jobs were decreasing little by little until 4 (parallel_chains = 4). Could you please help me to fix this problem? I guess it is because of the parallel_chains = 4 in one model fit, but don't know exactly the reason.

Thank you very much.

HenrikBengtsson commented 2 years ago

Answered in https://github.com/HenrikBengtsson/future/discussions/644#discussioncomment-3712438.

PS. Sorry for the delay. I've been travelling. I only had time to move your original question here over to the 'Future Discussions'. I'm trying to keep questions and discussions there, so we can reserve the issue trackers for bugs and feature requests.