Closed mmp3 closed 3 years ago
Here is a pretty long winded explanation of what is most likely going on for your issue. Hopefully you will find it helpful!
First, the warning Running in series; one of the packages doParallel, foreach or doSNOW is missing
is not actually an error. It comes from this part of the code:
https://github.com/adw96/DivNet/blob/bea62385d3dfa5e9423f894b2049f4a4138ae54a/R/MCmat.R#L66
Here is the context:
if (ncores > 1) {
warning("Running in series; one of the packages doParallel, foreach or doSNOW is missing")
}
Y.MH <- foreach(i = 1:N, .combine = 'acomb3', .multicombine = TRUE, .packages = "foreach") %do% {
MH_path(i)
}
So if you don't have those packages installed, but request more than one core, you trigger the warning but get the "series" version of the algorithm. So installing those packages and running again isn't what is actually "fixing" the warnings.
I think what is going on in your case is that you have run into a couple of interesting things about R. First, is the way R stores numbers. When you say you get numbers ~1e-16, I'm betting they were something like 1.110223e-16
, right? Check this out:
> (0.1 * 6) - (6 / 10)
[1] 1.110223e-16
> 0.6 - (6/10)
[1] 0
You would expect from a math standpoint those two to be the same (ie 0) but numerically they are not (see https://stat.ethz.ch/pipermail/r-help/2012-March/307139.html for more).
If you look at the .Machine
var in R, you will get a lot of info about the numerical characteristics of the machine that R is running on. As far as I know, R is required to work with IEEE 754 floating point numbers (see https://stat.ethz.ch/R-manual/R-devel/library/base/html/double.html). These have 53 significant bits, so you get 15 or 16 significant decimal digits (see https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats).
Given that, check out some numbers from .Machine
(on my machine at least).
> .Machine$double.base
[1] 2
> .Machine$double.neg.ulp.digits
[1] -53
> .Machine$double.base ^ .Machine$double.neg.ulp.digits
[1] 1.110223e-16
> .Machine$double.neg.eps
[1] 1.110223e-16
So basically, depending on how things shake out you can get numerical values of 1.110223e-16 numerically when logically you would expect zero. It similar to how you often get p-values < 2.2e-16
reported by R (see https://stats.stackexchange.com/a/78840).
Okay finally, why did the numbers change to zeros when you reran after installing doSNOW
. Here is the second interesting thing you've run into: how R deals with random number generation. Here is an example from my computer which does have doSNOW.
library(doSNOW)
library(DivNet)
data(Lee)
seed <- 12345
set.seed(seed)
divnet_phylum <- divnet(tax_glom(Lee, taxrank="Phylum"),
X = "char",
ncores = 1,
B = 2)
a <- divnet_phylum[["bray-curtis"]][c("BW1", "R10", "R11BF", "R2"), "R6"]
set.seed(seed)
divnet_phylum <- divnet(tax_glom(Lee, taxrank="Phylum"),
X = "char",
ncores = 1,
B = 2)
b <- divnet_phylum[["bray-curtis"]][c("BW1", "R10", "R11BF", "R2"), "R6"]
set.seed(seed)
divnet_phylum <- divnet(tax_glom(Lee, taxrank="Phylum"),
X = "char",
ncores = 2,
B = 2)
c <- divnet_phylum[["bray-curtis"]][c("BW1", "R10", "R11BF", "R2"), "R6"]
set.seed(seed)
divnet_phylum <- divnet(tax_glom(Lee, taxrank="Phylum"),
X = "char",
ncores = 2,
B = 2)
d <- divnet_phylum[["bray-curtis"]][c("BW1", "R10", "R11BF", "R2"), "R6"]
a
b
c
d
And here is the output of that
> a
BW1 R10 R11BF R2
2.212139e-01 1.655468e-01 3.063938e-01 1.110223e-16
> b
BW1 R10 R11BF R2
2.212139e-01 1.655468e-01 3.063938e-01 1.110223e-16
> c
BW1 R10 R11BF R2
0.2200620 0.1649343 0.3072189 0.0000000
> d
BW1 R10 R11BF R2
0.2156653 0.1688249 0.3038798 0.0000000
So what's going on? Notice how before each run I use set.seed
? For a
and b
I run divnet with 1 core. This makes the foreach
code run serially. So the random seed is used and a
and b
are the same since I reset it before running divnet
function. For the next two, I use 2 cores, which uses the parallel logic instead. Now it is a quirk of how that works that the seed set in the top level is not the one used by the individual cores in the cluster. (This is the reason a package like doRNG was created). So even though I use set.seed
before each call to divnet
it only works in the first two cases. See how the c
and d
vectors are different from each other and from the a
and b
.
Also notice in the first run, the R2 vs R2 value is 1.110223e-16
rather than 0
. That's the numerical precision stuff again.
The random seed inside the parallel version of foreach
isn't working properly. Fixing it isn't exactly straight forward, but one way is using a package like doRNG to manage random seeds inside the cluster. (pinging @adw96 & @bryandmartin)
Great question @mmp3 - and a huge thank you for the comprehensive and perfect response @mooreryan ! I have very little to add except that the actual impact of the nasty irreproducible-multiple-core-random-seed issue can be somewhat mitigated by increasing the accuracy of the maximum likelihood search, including by setting tuning
to "fast"
, or better yet, increasing EMiter
and MCiter
. To increase the accuracy of the estimated variances, you can increase B
(for the parametric bootstrap option) or nsub
(for the nonparametric bootstrap option).
The trade off for retaining more accuracy (and more robustness to Monte Carlo variation) is time - both the above will increase the runtime of divnet
.
Thanks for the ping @mooreryan - at this time it's not a priority for us to fix this random seed issue (and we are a little stretched for resources - as you can tell by the date stamps), but we welcome help from anyone on this!
Thank you both for your great questions and answers.
divnet
fails silently if one of the parallelization packages is not installed.Checking the warnings:
and many of the entries of the
res$"bray-curtis"
matrix are ~1e-16
.It turns out that the
doSNOW
package was not installed on my system. After installingdoSNOW
, those warnings do not happen anymore and theres$"bray-curtis"
matrix has entries >>1e-16
.