Open kingtom2016 opened 3 years ago
I see this is an old comment but I will provide my answer here if anyone else needs to do this:
library(parallel)
library(pbmcapply)
Max_CPU_Cores = detectCores()
Upper_Limit_CPU_Cores = 2*round((Max_CPU_Cores*0.8)/2)
# Parallel Rareification Function
# This is a working parallelized function of iNEXT. 5x faster than previously
parallel_rarefaction <- function(shuffled_data){
out_df <- iNEXT(as.vector(shuffled_data), q=0, datatype="abundance")
df <- fortify(out_df, type=1)
return(df)
}
This creates a variable that will identify the number of CPUs your computer has and tell the parallel approach to use 80% of your computer's CPU power. The function applies iNEXT in parallel which is applied from the pbmclapply
library. The only way I could get this to work was using lapply
. So the data has to be converted so each sample is a matrix and then each of these samples is a member of a list. Then just run the function in pbmclapply
and do whatever downstream processes in iNEXT you need to do.
iNEXT_output <- pbmclapply(My_Large_Data_Set, parallel_rarefaction, mc.cores = Upper_Limit_CPU_Cores)
In my case, this makes the analysis run 5x faster and also prevents my computer from crashing.
To amend my previous comment, it appears that the iNEXT package update from version 3.0.0+ causes an issue with the parallelization or is slower than it used to be. Reverting to an older version of iNEXT fixes this issue.
require(devtools)
install_version("iNEXT", version = "2.0.19", repos = "http://cran.us.r-project.org")
I run the iNEXT and ggiNEXT using 100 samples, the time that functions required is too long. How to make it faster or parallel.