Closed mooreryan closed 4 years ago
This is a test with the Lee
dataset. I'll put up a Rmd vignette that generates a bunch of data shortly.
Here is a screenshot of the profile output for the Lee dataset subsetted to 500 taxa running the following divnet call: divnet(lee500, X = "char", ncores = 1, B = 5)
.
fit_aitchison
takes most of the time (~85%). Most of the time in that function is split between run_mc
(~42%) and default_network
(~31%).
The rest of the runtime is in the make_w
function (~15%), with most of that function's runtime being dominated by the call to MASS:mvrnorm
. It might be worth looking at switching to a faster multivariate normal distribution simulator.
This is so fantastic, @mooreryan ! We are just blown away that you implemented this -- thank you! Let me know when it's ready for code review
More updates:
fit_aitchison
takes most of the time (~85%). Most of the time in that function is split betweenrun_mc
(~42%) anddefault_network
(~31%).
I sped up default_network
by about half. There was a call to svd
to check for errors before the MASS::ginv
call here:
https://github.com/mooreryan/DivNet/blob/e1f5aa3950b37fda0b7eec8aa56aa44a484fabe3/R/networks.R#L10
However, in the code for ginv
function (X, tol = sqrt(.Machine$double.eps))
{
if (length(dim(X)) > 2L || !(is.numeric(X) || is.complex(X)))
stop("'X' must be a numeric or complex matrix")
if (!is.matrix(X))
X <- as.matrix(X)
Xsvd <- svd(X)
if (is.complex(X))
Xsvd$u <- Conj(Xsvd$u)
Positive <- Xsvd$d > max(tol * Xsvd$d[1L], 0)
if (all(Positive))
Xsvd$v %*% (1/Xsvd$d * t(Xsvd$u))
else if (!any(Positive))
array(0, dim(X)[2L:1L])
else Xsvd$v[, Positive, drop = FALSE] %*% ((1/Xsvd$d[Positive]) *
t(Xsvd$u[, Positive, drop = FALSE]))
}
It's doing an SVD as one of the first steps, so if the test svd
were to fail, then ginv
should fail in the same way. So I removed the extra svd call, and adjusted the error message. This saved about 1/2 of the runtime of the default_network
function.
The rest of the runtime is in the
make_w
function (~15%), with most of that function's runtime being dominated by the call toMASS:mvrnorm
. It might be worth looking at switching to a faster multivariate normal distribution simulator.
I tried replacing MASS:mvrnorm
with the mvnfast version of the same function. It failed however since the mvnfast version uses a Cholesky decomp (which failed for the Lee dataset) rather than eigen decomp as the MASS version does.
Also, I tried replacing pretty much every little function that was taking a non-trivial amount of time and replacing it with pure Eigen versions (like svd, ginv, etc) but they were slower as it had to get R matrices in and out of Rcpp.
@adw96 The Travis checks are working now, but I'm not sure how to get the AppVeyor working. Are you all still using appveyor?
More updates....
I'm now working on trying to get some speedup with multiple cores. However, nothing I've tried so far has really worked. Currently, it is parallelizing the replicate
call. It sort of helps but not too much. The original MCmat
function parallelized over the samples (so it ran the MC loop for each sample in parallel). On my machine, that really didn't help too much either. I also tried to implement this in the Rcpp code using RcppParallel, but again, not much help.
According to this StackOverflow answer, part of the issue may be memory bandwidth (as opposed to total available memory). I did a test where I ran the samedivnet
function in three different R sessions at the same time, and each instance took about twice as long as running a single instance at a time, which suggests memory bandwidth may be a problem.
Looking at the code, I think it could be made more efficient by having a better memory layout for all the data. I'm thinking that there's probably a bunch of cache misses. I've used callgrind with kcachegrind to find cache misses in regular C code, but never before with Rcpp.
Sorry, we deleted the Appveyor checks -- we decided Travis was sufficient. We should have let you know, sorry about that!
I've had a bit of a think about the architecture I'm not sure about how multiple cores could help. One of the challenges of EM-MH algorithms is that they don't parallelize very easily (unlike, eg, a Bayesian sampling procedure with multiple chains). If someone was really cautious they could run it using different tuning parameters on different cores to check convergence, but I think that's solving a problem don't have.
Looking at the code, I think it could be made more efficient by having a better memory layout for all the data. I'm thinking that there's probably a bunch of cache misses. I've used callgrind with kcachegrind to find cache misses in regular C code, but never before with Rcpp.
Unfortunately I don't know anything about that -- this is beyond the scope of my R knowledge.
Merging #35 into master will increase coverage by
0.84%
. The diff coverage is78.26%
.
@@ Coverage Diff @@
## master #35 +/- ##
==========================================
+ Coverage 67.56% 68.41% +0.84%
==========================================
Files 12 12
Lines 592 649 +57
==========================================
+ Hits 400 444 +44
- Misses 192 205 +13
Impacted Files | Coverage Δ | |
---|---|---|
R/networks.R | 36.84% <36.84%> (ø) |
|
R/estimate_diversity.R | 89.71% <45.83%> (-7.19%) |
:arrow_down: |
src/mc.cpp | 94.52% <94.52%> (ø) |
|
R/fit_aitchison.R | 92.5% <95.45%> (+0.19%) |
:arrow_up: |
R/utility.R | 61.11% <0%> (-5.56%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0c19b88...5d6a8d0. Read the comment docs.
Hi @mooreryan ,
Thanks for all your work on this. The prospect of a major speed upgrade is very exciting. However, it seems to substantially change some of our estimates that we test for in our unit testing. For example, the following code should all evaluate to TRUE
:
n_taxa <- 6
n <- 6
my_counts <- matrix(c(rep(c(rep(100, n_taxa/2), rep(1, n_taxa/2)), n/2),
rep(c(rep(1, n_taxa/2), rep(100, n_taxa/2)), n/2)),
nrow = n, ncol = n_taxa, byrow=T)
my_discrete_covariate <- cbind(1, rep(c(0,1), each = n/2))
fa <- fit_aitchison(my_counts, X=my_discrete_covariate, base = 1)
abs(fa$fitted_z[1,1] - 1/3) <= 1e-2
abs(fa$fitted_z[4,4] - 1/3) <= 1e-2
I get some slight differences in your code. Do you know why this might be?
Thanks, Bryan
@bryandmartin You're correct in that the code is not quite correct yet!
I will have to go back through the code again as it's been a while since I looked at it, but yeah, back in the summer I saw that many of the unit tests were failing with the RCpp code :(
I'm closing this pull request. I'm going to open a new one that should take the place of this one.
Issue #32 brought up some issues with running DivNet on datasets with lots of OTUs/ASVs/taxa. This pull request is a first attempt at speeding things up.
The main differences are replacing MCmat and MCrow with Rcpp variants.
I also tried a bunch of different ways of parallelizing the code, including using RcppParallel, but none were that promising. That would be something for more future work.
I'd say that this is still in the draft stage, but I wanted to submit the (draft) pull request to get some feedback.