Closed HenrikBengtsson closed 8 years ago
I have added (for now in develop) a set of tests running scone with many different back-ends, including BiocParallel::SnowParam(workers=2, type="SOCK")
.
All tests pass and I also ran the vignette with this back-end with no issue. The particular example that @HenrikBengtsson references is now gone (we decided to not pursue zinb() imputation), but there are still calls to bplapply that look suspicious, like:
scaled <- bplapply(seq_len(NROW(sc_params)), function(i) scaling[[sc_params[i,2]]](imputed[[sc_params[i,1]]]), BPPARAM=bpparam)
However, this seems to work fine. Perhaps there is some automatic copy going on here? @HenrikBengtsson any thoughts?
Although I'm not a super savvy BiocParallel users, I don't think BiocParallel exports globals automatically, so I am a bit surprised that your most recent bplapply()
snippet
scaled <- bplapply(seq_len(NROW(sc_params)), function(i) scaling[[sc_params[i,2]]](imputed[[sc_params[i,1]]]), BPPARAM=bpparam)
doesn't give an error reporting on object 'scaling' not found
(or sc_params
or imputed
) when using an external R process as the backend such as SnowParam(workers=2, type="SOCK")
. Are you 100% sure your package tests covers that piece of code?
@mtmorgan, do you have any comments?
BiocParallel is behaving like snow for these backends, so the function and the environment in which it is defined are copied. So
register(SnowParam(2))
fun0 = function(i) a[i]
p0 = function() {
a = 1:2
bplapply(1:2, fun0) # fails
}
p1a = function() {
fun1 = function(i) a[i]
a = 1:2
bplapply(1:2, fun1) # works
}
p1b = function() {
a=1:2
bplapply(1:2, function(i) a[i])
}
p2 = function() {
fun1 = function(i) a[i]
a = 1:2
x = integer(1e8)
bplapply(1:2, fun1) # works, expensive!
}
leads to
> bpstart(bpparam())
> p0()
Error: BiocParallel errors
element index: 1, 2
first error: object 'a' not found
> system.time(p1a())
user system elapsed
0.004 0.000 0.074
> system.time(p2())
user system elapsed
1.164 0.096 1.583
> bpstop(bpparam())
I'm pretty sure too that a function defined at the (top level) of package actually has the package namespace serialized to the workers, so all variables in the package are available.
Ah, that could be why it works, i.e. they probably have the p1b()
case. Thxs Martin.
Thank you Martin!
Now I have a much better idea of what's going on. Which makes me worried about the large objects that are defined and hence exported. But that's a different issue, so I should close this one.
Just a comment after having talked about
bplapply()
with @drisso the other day;Make sure to test the functions that call
bplapply()
with other backends than sequential and multicore too. For instance, test them with backends that run in separate background R processes, e.g. PSOCK clusters (parallel::makeCluster(2L)
). Because, these require that all variables are properly exported to the worker (something you don't have to worry about when you use sequential and multicore workers).Example:
Here's an example where I suspect you'll get an error because globals (
zhat
,muhat
,coefs_pi
) are not exported: