adw96 / DivNet

diversity estimation under ecological networks
83 stars 18 forks source link

how can I make the estimation speed up and escape the limit error? #64

Closed Ivy-ops closed 3 years ago

Ivy-ops commented 3 years ago

Hi, Thanks for this amazing package, very useful.

I have a phyloseq object with 67872 taxa and 59 samples. I tried divnet_phyloseq <- divnet(W= my, base = 1, ncores = 4) vector memory exhausted (limit reached?) May I know how to solve this problem? Thanks!!

MicroIrene commented 3 years ago

Hi! I have a similar question and I thought I will post it here... First, thank you for the amazing package, it is great!

I am working with soil samples, I have 120 samples and around 9,000 taxa, so much less taxa than the original question, but still it is crashing on my computer, so I am looking for a way to escape the limit error. I decided to subset my dataset by sample type, so I divided my dataset into 10. With 12 samples, it automatically removes the absent taxa in the subset, and it takes ~3h to run, but it runs! I am just wondering if this is correct? Or and I making a big mistake by sub-setting my data? I am not an expert in code, so I was wanted to check if this is OK.

Thank you so much,

Irene

mooreryan commented 3 years ago

Hey y'all! I was running into a similar problem as yours and so I wrote an alternate implementation of DivNet that can handle large datasets. I talked to @adw96 about this a long while back about releasing it, but never got it up on to GitHub until now.

@MicroIrene, I have used on a dataset of ~10,000 ASVs with ~350 samples in a reasonable amount of time. @Ivy-ops, I haven't tried it on any dataset with >10,000 taxa, but it should work if you have access to a machine with enough RAM (and plenty of time).

Feel free to try it out. It isn't as easy to use as @adw96's awesome package, but it will crank through large datasets at least!

MicroIrene commented 3 years ago

Hi @mooreryan ,

sorry for my slow reply, I have just seen this! Thank you very much for the this implementation! It sounds great, although a bit more complicated to run, but I will explore it. Thank you,

Irene