Closed Ivy-ops closed 3 years ago
Hi! I have a similar question and I thought I will post it here... First, thank you for the amazing package, it is great!
I am working with soil samples, I have 120 samples and around 9,000 taxa, so much less taxa than the original question, but still it is crashing on my computer, so I am looking for a way to escape the limit error. I decided to subset my dataset by sample type, so I divided my dataset into 10. With 12 samples, it automatically removes the absent taxa in the subset, and it takes ~3h to run, but it runs! I am just wondering if this is correct? Or and I making a big mistake by sub-setting my data? I am not an expert in code, so I was wanted to check if this is OK.
Thank you so much,
Irene
Hey y'all! I was running into a similar problem as yours and so I wrote an alternate implementation of DivNet that can handle large datasets. I talked to @adw96 about this a long while back about releasing it, but never got it up on to GitHub until now.
@MicroIrene, I have used on a dataset of ~10,000 ASVs with ~350 samples in a reasonable amount of time. @Ivy-ops, I haven't tried it on any dataset with >10,000 taxa, but it should work if you have access to a machine with enough RAM (and plenty of time).
Feel free to try it out. It isn't as easy to use as @adw96's awesome package, but it will crank through large datasets at least!
Hi @mooreryan ,
sorry for my slow reply, I have just seen this! Thank you very much for the this implementation! It sounds great, although a bit more complicated to run, but I will explore it. Thank you,
Irene
Hi, Thanks for this amazing package, very useful.
I have a phyloseq object with 67872 taxa and 59 samples. I tried
divnet_phyloseq <- divnet(W= my, base = 1, ncores = 4)
vector memory exhausted (limit reached?) May I know how to solve this problem? Thanks!!