Closed amorris28 closed 3 years ago
Hi Andrew! Thanks so much again for using DivNet. Some thoughts
network="diagonal"
for a dataset of this size. This means you're allowing overdispersion (compared to a plugin aka multinomial model) but not a network structure. This isn't just about computational expense -- it's about the reliability of the network estimates. Essentially estimating network structure on 20k variables (taxa) with 50 samples with any kind of reliability is going to be very challenging, and I don't think that it's worth doing here. In our simulations we basically found that overdispersion contributes the bulk of the variance to diversity estimation (i.e. overdispersion is more important than network structure), so I don't think you are going to lose too much anyway. tuning
. The default is
list(EMiter = 6, EMburn = 3, MCiter = 500, MCburn = 250)
Doing fewer EMiters and MCiters reduces runtime. Perhaps try
list(EMiter = 6, EMburn = 3, MCiter = 250, MCburn = 100)
If you're worried that it's stalling out entirely, to check that it runs, try
list(EMiter = 6, EMburn = 3, MCiter = 10, MCburn = 5)
Note that we parallelise over MCiter. This is a great test case for us so thanks for bringing it to our attention! Never in my wildest dreams did I think that someone would try to run this with 20k taxa. (My imagination stops at around 5k.) I guess I need to work with more soil!
Amy
@bryandmartin Anything you want to add?
Hey Amy!
I'm glad this is a helpful case for you all. I would love to use DivNet going forward and this is not an atypical data set for our lab group so getting to know how to make it work will be super helpful. I will try network='diagonal'
and playing with the tuning
argument to see how things work. Let me know how your simulations go.
Thank you for the quick turn-around! Andrew
Ok a quick update (a bigger sim to come): time-vs-q.pdf
Conclusions:
I'm upscaling q (number of taxa) and will see how the trends continue.
I was having a similar issue to the original poster (see issue #28). Large number of ASV/OTU/taxa really aren't feasible it seems.
I've also found that the ncores
option really doesn't provide much benefit.
In a comment on a previous pull request (https://github.com/adw96/DivNet/pull/29#issuecomment-510617485), I found that the MCrow
(and MCmat
) functions are taking the most CPU time, so today, I started work on rewriting those functions in Rcpp. Still working some kinks out of it, but it's definitely faster.
This might be helpful for the original poster as well... While working on speeding up the divnet
function, I made this little graph of how number of taxa scales with time. The dataset is the included Lee
dataset.
If that trend holds for very large numbers of taxa (not sure if it actually would), then running ~20,000 ASVs would take at least a couple of hours.
This is fantastic to know, @mooreryan! EM-MH algorithms are really well-suited to Rcpp but we just haven't been able to prioritise rewriting it. We would be so rapt if you were to implement it, and we would love to add you as a package coauthor/maintainer.
Hey everyone,
Really appreciate the work everyone has on divnet. Amy, you mentioned that a diagonal matrix is most appropriate for a large number of taxa and small number of samples, as you cannot reliably estimate the interactions.
Do you think this holds true if I have 1000-2000 samples? The samples are from soil, and are geographically diverse.
Cheers, Chris
Moved over from twitter.
I'm trying to run
divnet
on ASVs with a dataset of 44 samples and 19,921 ASVs. No ASVs appear in all samples so I've chosen a reference ASV that is present in 42 of the 44 indicated byref_otu
. I'm also leavingX = NULL
with no design matrix so I'm just trying to estimate diversity and confidence intervals for each sample.physeq
is my phyloseq object. If I run this on a cluster with 28 cores and 128 GB of memory, I don't see any progress after ~30 minutes. Running locally on my 4 core, 16 GB machine it crashes, I think because it runs out of memory. Function call below:Thank you for the help on this!