HenrikBengtsson / BiocParallel.FutureParam

:rocket: R package: BiocParallel.FutureParam - Use Futures with BiocParallel
https://BiocParallel.FutureParam.futureverse.org
7 stars 4 forks source link

nested remote Biocparallel #1

Open hummuscience opened 6 years ago

hummuscience commented 6 years ago

It is me again :)

Does the nesting of the futures also work with BiocParallel.FutureParam like doFuture and Future does?

Does this locally:

library("BiocParallel")
register(MulticoreParam(36))

Translate to this when working with a remote machine:

library("BiocParallel.FutureParam")
register(FutureParam())
plan(list(tweak(remote, workers = "monster"), multicore))

With the same syntax of %->% to "peel off" futures?

HenrikBengtsson commented 6 years ago

Hey and congrats to the first issue posted here :)

Correct, the problem is the same as you observed with doFuture; the %<-% / future() functions of the future package does not know about the doFuture+foreach and the BiocParallel.FutureParam+BiocParallel frameworks. It only passes down the plan() stack, but it does not re-register the backends with foreach and BiocParallel per se. The workaround for now is to do this manually, which is less than ideal.

Example

Here is a minimal example showing what the problem currently is and the manual workaround:

> library("BiocParallel.FutureParam")
> register(FutureParam())
> plan(multisession)

# The BiocParallel back-end that master will use
> bpparam()
class: FutureParam
  bpisup: TRUE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE
  bplogdir: NA

# The BiocParallel back-end that a worker will use
> bp %<-% bpparam()
> bp
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE
  bpRNGseed: 
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

Manual workaround:

> bp %<-% { register(FutureParam()); bpparam() }
class: FutureParam
  bpisup: TRUE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bptimeout: 2592000; bpprogressbar: FALSE
  bplogdir: NA

Action

My plan is to add a mechanism to the future package allowing doFuture, BiocParallel.FutureParam, (your favorite higher-level future orchestration API here) to add an "onEntry" hook function that will be called whenever %<-% / future() happens. With this, then doFuture and BiocParallel.FutureParam can automatically get what they need in nested calls.

hummuscience commented 6 years ago

Just to confirm, after fiddling around a little to understand what you meant, this code worked :)

This is an example using DESeq2, an RNAseq analysis package that is very often used. Sweeet!

library("BiocParallel.FutureParam")

plan(list(tweak(remote, workers = "monster"), multicore))

{ register(FutureParam()); MulticoreParam() }

dds.animals %<-% DESeq(dds.animals,parallel = TRUE)