Closed HenrikBengtsson closed 8 years ago
It might be prudent to add a warning if the plan is multicore, but otherwise, that would seem useful.
I'm not sure I understand, why would we need a warning if multicore futures are used?
Nvm, I see you are silently handling the number of jobs the user spools off so that they don't fork bomb themselves.
Yes, I took a conservative approach and inserting a plan(eager)
at the beginning of every multicore expression.
There's also parallel::mcaffinity()
; I don't have much experience with it, but maybe support for it (and similar features) should be added at some point. See also https://stat.ethz.ch/pipermail/r-devel/2012-December/065313.html.
I also try future to be agile to number of allocated cores available to the R session, which is not always the same as the maximum number of cores, cf. availableCores()
. See https://github.com/HenrikBengtsson/future/issues/22.
Just writing down my thoughts: It could be that there are so many different f*apply()
approaches and strategies (e.g. various types of splitting/chunking) that it would make sense to have a separate package on top future, e.g. fplyr
.
In addition to lapply()
, another common need will be apply()
with futures, cf. parallel::parApply()
etc. Maybe the following is good enough for now?
fapply <- function(X, MARGIN, FUN, ...) {
fFUN <- function(...) { future(FUN(...)) }
res <- apply(X, MARGIN=1L, FUN=fFUN, ...)
res <- values(res) ## Efficient collection of values
sapply(res, FUN=I, simplify=TRUE)
}
Ah, we need to be careful in what we're exporting. Particularly, using
res[[ii]] %<=% FUN(x[[ii]], ...)
or
res[[ii]] <- future(FUN(x[[ii]], ...))
will require that all of x
is exported. It's more efficient to subset outside the future expression, i.e.
x_ii <- x[[ii]]
res[[ii]] %<=% FUN(x_ii, ...)
and
x_ii <- x[[ii]]
res[[ii]] <- future(FUN(x_ii, ...))
See https://github.com/ilarischeinin/QDNAseq/pull/1 for real world example.
Should we have a separate vignette on "best practises"?
UPDATE: I've created the doFuture package, which brings future support (pun!) to foreach, which in turn brings full future support to plyr. In other words, any type of future can be used for plyr:ing, e.g.
library("doFuture")
registerDoFuture()
plan(multiprocess)
library("plyr")
x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
llply(x, quantile, probs = 1:3/4, .parallel=TRUE)
Obviously, it is unlikely that the *ply()
functions are as efficient (memory and speed; mostly memory) as highly customized apply functions that are aware of futures, but this is certainly a good start and it opens up a huge well-established API.
Will restrain from creating a *ply API. For now plyr can be used.
Should we add something like?
to the future API?