Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

start TransientMulticoreParam, ensuring clean-up of environments #244

Closed mtmorgan closed 1 year ago

mtmorgan commented 1 year ago
mtmorgan commented 1 year ago

Hi @Jiefei-Wang not sure what you think of this?

TransientMulticoreParam uses bpstart,TransientMulticoreParam to 'clean' the state used, but the existing code did not call bpstart...

Also, the conditional test of !bpisup(oldParam) is irrelevant, since we only get to this point under that condition. And now that bpstart() is called, .bpstart_set_rng_stream() is also called.

TransientMulticoreParam() is not exported, and is not used elsewhere in the code.

Jiefei-Wang commented 1 year ago

I haven't checked the code but I'm under the impression that we avoid using bpstart for TransientMulticoreParam on purpose.

We want to reduce the amount of data transfer. Without calling bpstatt, the master will prepare everything the worker needs and directly copy itself. Therefore, the worker will inherit the data from the master. However, if we call bpstart in bpinit, the data is not ready yet(has not been divided into tasks) so we still need to use the SNOW connection to send the data to the worker.

As you said, if TransientMulticoreParam uses bpstart,TransientMulticoreParam to 'clean' the state used, we might want to rename it and call it after parallelization.

mtmorgan commented 1 year ago

Thanks Jiefei.

Actually TransientMulticoreParam avoids serialization during .send_to() where it forks a process directly and so doesn't serialize anything. I think it's OK to invoke bpstart() as part of the constructor.

Historically this comment https://github.com/Bioconductor/BiocParallel/pull/156#issue-1014566729 provides a justification and performance test case. Under Bioc-3.14 I see

> f <-
+     function(param, y)
+ {
+     bplapply(1:100, function(x, y) sum(y), y = y, BPPARAM = param)
+ }
>
> y <- rnorm(1000000)
> param = MulticoreParam(tasks = 100)
> system.time(f(param, y)) # less than 1s
   user  system elapsed
  0.607   1.261   0.410
> param <- bpstart(MulticoreParam(tasks = 100))
> system.time(f(param, y)) # more than 4s
   user  system elapsed
  0.759   0.686   2.383

showing the benefit of TransientMulticoreParam. But in the current (patched or unpatched) devel I see

> param = MulticoreParam(tasks = 100)
> system.time(f(param, y)) # less than 1s
   user  system elapsed
  3.257   1.450   0.534
> param <- bpstart(MulticoreParam(tasks = 100))
> system.time(f(param, y)) # more than 4s
   user  system elapsed
  0.273   0.056   0.640

so I think the performance provided by TransientMulticoreParam has been addressed by other changes -- maybe in a different pull request it should just be removed to simplify the code base at very modest performance cost, or actually made even more performant (but that would probably be by improving the path to .send_to(), and benefit all Param?)?