amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
428 stars 107 forks source link

WIP: fix excessive thread spawning #572

Closed vkhodygo closed 4 months ago

vkhodygo commented 1 year ago

This should partially resolve #566

stefvanbuuren commented 1 year ago

@thomvolker Could you form an opinion on whether setting these environmental variables would be a good idea?

vkhodygo commented 1 year ago

@stefvanbuuren @thomvolker This is just a draft, it's not supposed to be merged as it is now. I'd like to have a discussion beforehand. Also, for the sake of clarity, by cores below I mean both logical and physical cores.

Let's take a look at the signature of futuremice:

futuremice(
  data,
  m = 5,
  parallelseed = NA,
  n.core = NULL,
  seed = NA,
  use.logical = TRUE,
  future.plan = "multisession",
  packages = NULL,
  globals = NULL,
  ...
)

What I'd like to suggest is the following:

  1. Drop use.logical. Whereas hyper-threading does not result in 2x boost (I observed something like 1.7), the only case I saw when people didn't use it was when Meltdown was announced (yes, I know this is anecdotal evidence). However, the whole HT simply got switched off. Less confusion, less code to maintain, fewer variables to take into account.
  2. In the case Sys.setenv() stays globals must not override them.
  3. If int(n.core / m) >= 2 allow multiple cores per session via Sys.setenv(). That would mean multiple checks, however, that should boost the performance a bit. At the same time, ranger must be limited separately since this is AFAIK the only method package that's out of our control.
  4. "multicore" plan must not have multiple threads per fork. That's considered a bad practice.
  5. "cluster" plan is a completely different beast, I'm not sure how to deal with it.
thomvolker commented 1 year ago

My two cents: I think it is bad practice to permanently change system variables on the user's end, especially when it is done implicitly. Perhaps we could ask the user whether this is something they would want to do, but in principle I think we should leave the system variables unchanged. Moreover, I'm afraid this implementation won't work, because for example OpenMP apparently ignores changes to the system variables if these occur after the program has started (see https://stackoverflow.com/questions/27319619/how-can-i-set-the-number-of-openmp-threads-from-within-the-program/27320691#27320691).

All in all, I think a fix on our end should limit the number of threads within the imputation functions, not by changing the user's system variables. In the functions that are inherent to mice, like pmm, we can specify the number of threads within the C++ code. When we depend on functions in other packages, we have to see whether it is possible to specify an argument to only use a single thread.

thomvolker commented 1 year ago

I just spend some time to dive into this a little further, and I think the problem lies mainly with C++ code that implicitly operates in parallel, as I don't think mice() itself parallelizes operations. Then the solution would probably be to limit the number of threads that gets invoked in the C++ code, which might be done by setting the number of threads and processes in BLAS through a C++ function that gets called in futuremice(). An alternative is to rely on the R-package RhpcBLASctl and use its functionality to limit the number of cores/threads that can be used internally. However, I don't know whether this (a) is sufficient to solve all implicit parallelization problems; (b) works for the random forest code; and (c) is platform agnostic (I don't think it is, as the package seems to depend on how R is built). @vkhodygo can you check whether this package works to solve the problem you experienced?

stefvanbuuren commented 1 year ago

||| Then the solution would probably be to limit the number of threads that gets invoked in the C++ code, which might be done by setting the number of threads and processes in BLAS through a C++ function that gets called in futuremice()

This sounds like a solution. How difficult is this to implement? And do we know whether "rf" would obey this setting, also across platforms?

vkhodygo commented 7 months ago

My bad, I was occupied with other things and also took some annual leave after that. @thomvolker Do you mind elaborating on this one?

stefvanbuuren commented 4 months ago

Closing because I am unsure whether this PR improves mice. @vkhodygo Please feel free to reopen.