adw96 / DivNet

diversity estimation under ecological networks
83 stars 18 forks source link

Rcpp #35

Closed mooreryan closed 4 years ago

mooreryan commented 5 years ago

Issue #32 brought up some issues with running DivNet on datasets with lots of OTUs/ASVs/taxa. This pull request is a first attempt at speeding things up.

The main differences are replacing MCmat and MCrow with Rcpp variants.

I also tried a bunch of different ways of parallelizing the code, including using RcppParallel, but none were that promising. That would be something for more future work.

I'd say that this is still in the draft stage, but I wanted to submit the (draft) pull request to get some feedback.

mooreryan commented 5 years ago

This is a test with the Lee dataset. I'll put up a Rmd vignette that generates a bunch of data shortly.

divnet_ntaxa_v_seconds

mooreryan commented 5 years ago

Screen Shot 2019-08-22 at 12 54 30

Here is a screenshot of the profile output for the Lee dataset subsetted to 500 taxa running the following divnet call: divnet(lee500, X = "char", ncores = 1, B = 5).

fit_aitchison takes most of the time (~85%). Most of the time in that function is split between run_mc (~42%) and default_network (~31%).

The rest of the runtime is in the make_w function (~15%), with most of that function's runtime being dominated by the call to MASS:mvrnorm. It might be worth looking at switching to a faster multivariate normal distribution simulator.

adw96 commented 5 years ago

This is so fantastic, @mooreryan ! We are just blown away that you implemented this -- thank you! Let me know when it's ready for code review

mooreryan commented 5 years ago

More updates:

default_network function

fit_aitchison takes most of the time (~85%). Most of the time in that function is split between run_mc (~42%) and default_network (~31%).

I sped up default_network by about half. There was a call to svd to check for errors before the MASS::ginv call here:

https://github.com/mooreryan/DivNet/blob/e1f5aa3950b37fda0b7eec8aa56aa44a484fabe3/R/networks.R#L10

However, in the code for ginv

function (X, tol = sqrt(.Machine$double.eps)) 
{
    if (length(dim(X)) > 2L || !(is.numeric(X) || is.complex(X))) 
        stop("'X' must be a numeric or complex matrix")
    if (!is.matrix(X)) 
        X <- as.matrix(X)
    Xsvd <- svd(X)
    if (is.complex(X)) 
        Xsvd$u <- Conj(Xsvd$u)
    Positive <- Xsvd$d > max(tol * Xsvd$d[1L], 0)
    if (all(Positive)) 
        Xsvd$v %*% (1/Xsvd$d * t(Xsvd$u))
    else if (!any(Positive)) 
        array(0, dim(X)[2L:1L])
    else Xsvd$v[, Positive, drop = FALSE] %*% ((1/Xsvd$d[Positive]) * 
        t(Xsvd$u[, Positive, drop = FALSE]))
}

It's doing an SVD as one of the first steps, so if the test svd were to fail, then ginv should fail in the same way. So I removed the extra svd call, and adjusted the error message. This saved about 1/2 of the runtime of the default_network function.

make_w function

The rest of the runtime is in the make_w function (~15%), with most of that function's runtime being dominated by the call to MASS:mvrnorm. It might be worth looking at switching to a faster multivariate normal distribution simulator.

I tried replacing MASS:mvrnorm with the mvnfast version of the same function. It failed however since the mvnfast version uses a Cholesky decomp (which failed for the Lee dataset) rather than eigen decomp as the MASS version does.

Other stuff

Also, I tried replacing pretty much every little function that was taking a non-trivial amount of time and replacing it with pure Eigen versions (like svd, ginv, etc) but they were slower as it had to get R matrices in and out of Rcpp.

mooreryan commented 5 years ago

@adw96 The Travis checks are working now, but I'm not sure how to get the AppVeyor working. Are you all still using appveyor?

mooreryan commented 5 years ago

More updates....

I'm now working on trying to get some speedup with multiple cores. However, nothing I've tried so far has really worked. Currently, it is parallelizing the replicate call. It sort of helps but not too much. The original MCmat function parallelized over the samples (so it ran the MC loop for each sample in parallel). On my machine, that really didn't help too much either. I also tried to implement this in the Rcpp code using RcppParallel, but again, not much help.

According to this StackOverflow answer, part of the issue may be memory bandwidth (as opposed to total available memory). I did a test where I ran the samedivnet function in three different R sessions at the same time, and each instance took about twice as long as running a single instance at a time, which suggests memory bandwidth may be a problem.

Looking at the code, I think it could be made more efficient by having a better memory layout for all the data. I'm thinking that there's probably a bunch of cache misses. I've used callgrind with kcachegrind to find cache misses in regular C code, but never before with Rcpp.

adw96 commented 5 years ago

Sorry, we deleted the Appveyor checks -- we decided Travis was sufficient. We should have let you know, sorry about that!

I've had a bit of a think about the architecture I'm not sure about how multiple cores could help. One of the challenges of EM-MH algorithms is that they don't parallelize very easily (unlike, eg, a Bayesian sampling procedure with multiple chains). If someone was really cautious they could run it using different tuning parameters on different cores to check convergence, but I think that's solving a problem don't have.

adw96 commented 5 years ago

Looking at the code, I think it could be made more efficient by having a better memory layout for all the data. I'm thinking that there's probably a bunch of cache misses. I've used callgrind with kcachegrind to find cache misses in regular C code, but never before with Rcpp.

Unfortunately I don't know anything about that -- this is beyond the scope of my R knowledge.

codecov-io commented 5 years ago

Codecov Report

Merging #35 into master will increase coverage by 0.84%. The diff coverage is 78.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #35      +/-   ##
==========================================
+ Coverage   67.56%   68.41%   +0.84%     
==========================================
  Files          12       12              
  Lines         592      649      +57     
==========================================
+ Hits          400      444      +44     
- Misses        192      205      +13
Impacted Files Coverage Δ
R/networks.R 36.84% <36.84%> (ø)
R/estimate_diversity.R 89.71% <45.83%> (-7.19%) :arrow_down:
src/mc.cpp 94.52% <94.52%> (ø)
R/fit_aitchison.R 92.5% <95.45%> (+0.19%) :arrow_up:
R/utility.R 61.11% <0%> (-5.56%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 0c19b88...5d6a8d0. Read the comment docs.

bryandmartin commented 4 years ago

Hi @mooreryan ,

Thanks for all your work on this. The prospect of a major speed upgrade is very exciting. However, it seems to substantially change some of our estimates that we test for in our unit testing. For example, the following code should all evaluate to TRUE:

n_taxa <- 6
n <- 6
my_counts <- matrix(c(rep(c(rep(100, n_taxa/2), rep(1, n_taxa/2)), n/2),
                        rep(c(rep(1, n_taxa/2), rep(100, n_taxa/2)), n/2)),
                      nrow = n, ncol = n_taxa, byrow=T)
my_discrete_covariate <- cbind(1, rep(c(0,1), each = n/2))
fa <- fit_aitchison(my_counts, X=my_discrete_covariate, base = 1)

abs(fa$fitted_z[1,1] - 1/3) <= 1e-2
abs(fa$fitted_z[4,4] - 1/3) <= 1e-2

I get some slight differences in your code. Do you know why this might be?

Thanks, Bryan

mooreryan commented 4 years ago

@bryandmartin You're correct in that the code is not quite correct yet!

I will have to go back through the code again as it's been a while since I looked at it, but yeah, back in the summer I saw that many of the unit tests were failing with the RCpp code :(

mooreryan commented 4 years ago

I'm closing this pull request. I'm going to open a new one that should take the place of this one.