JohnsonHsieh / iNEXT

R package for interpolation and extrapolation
https://JohnsonHsieh.github.com/iNEXT
57 stars 26 forks source link

How to run the function faster or parallel? #65

Open kingtom2016 opened 3 years ago

kingtom2016 commented 3 years ago

I run the iNEXT and ggiNEXT using 100 samples, the time that functions required is too long. How to make it faster or parallel.

abrown435 commented 2 years ago

I see this is an old comment but I will provide my answer here if anyone else needs to do this:

library(parallel)
library(pbmcapply)
Max_CPU_Cores = detectCores()
Upper_Limit_CPU_Cores = 2*round((Max_CPU_Cores*0.8)/2)
# Parallel Rareification Function
# This is a working parallelized function of iNEXT. 5x faster than previously
parallel_rarefaction <- function(shuffled_data){
  out_df <- iNEXT(as.vector(shuffled_data), q=0, datatype="abundance")
  df <- fortify(out_df, type=1)
  return(df)
}

This creates a variable that will identify the number of CPUs your computer has and tell the parallel approach to use 80% of your computer's CPU power. The function applies iNEXT in parallel which is applied from the pbmclapply library. The only way I could get this to work was using lapply. So the data has to be converted so each sample is a matrix and then each of these samples is a member of a list. Then just run the function in pbmclapply and do whatever downstream processes in iNEXT you need to do.

iNEXT_output <- pbmclapply(My_Large_Data_Set, parallel_rarefaction, mc.cores = Upper_Limit_CPU_Cores)

In my case, this makes the analysis run 5x faster and also prevents my computer from crashing.

abrown435 commented 2 years ago

To amend my previous comment, it appears that the iNEXT package update from version 3.0.0+ causes an issue with the parallelization or is slower than it used to be. Reverting to an older version of iNEXT fixes this issue.

require(devtools)
install_version("iNEXT", version = "2.0.19", repos = "http://cran.us.r-project.org")