Open cbird808 opened 5 years ago
here's another time consuming step:
# Use the taxonimic rank and the TAXID as coordinates to assign the scientific name
# in the appropriate field
for (i in 1:(length (higherTaxa))){
for (j in 1:(length (higherTaxa[[i]]))){
if (is.na (attributes (higherTaxa[[i]][j])$names)){
break
}
if (attributes (higherTaxa[[i]][j])$names != "no rank"){ # Skip "no rank"
colIdx <- attributes (higherTaxa[[i]][j])$names
full[i,colIdx] <- as.character (sciname (id = as.numeric (higherTaxa[[i]][j]), taxdir = TAXDIR, names = ncbi_names))
}
}
}
#parallel version
fillTax <- function (i, TAXDIR) {
for (j in 1:(length (higherTaxa[[i]]))){
if (is.na (attributes (higherTaxa[[i]][j])$names)){
break
}
if (attributes (higherTaxa[[i]][j])$names != "no rank"){ # Skip "no rank"
colIdx <- attributes (higherTaxa[[i]][j])$names
full[i,colIdx] <- as.character (sciname (id = as.numeric (higherTaxa[[i]][j]), taxdir = TAXDIR, names = ncbi_names))
}
}
}
cl <- makeCluster(detectCores())
clusterExport(cl,"TAXDIR") #clusterExport(cl=cl, varlist=c("text.var", "ntv")
parLapply(cl, 1:length(higherTaxa), function(x) fillTax(x,TAXDIR) )
stopCluster(cl)
I've started improving this. Have streamlined the processing of charon and creation of CVT. have started using furrr to parallelize the time consuming tasks
Okay perfect. This has been on my list for long time. The script for counting OTUs (bin/CROP_size_fix.sh) is also nasty slow and trivially parallel. As in, the script itself could just be called in parallel on a subset of the data.
while I suspect there are other ways of improving the speed, time consuming steps can be parallelized. Note that the readme will have to be updated to include the parallel package in R.
Replace apply with parApply:
apply (X = charon, MARGIN = 1, function (x) {assign (x[[1]], x[[2]], as.numeric (x[[4]]), x[[5]], as.numeric (x[[6]]) ) })
#parallel version of apply
library(parallel)
cl <- makeCluster(detectCores())
parApply (cl=cl, charon, 1, function (x) {assign (x[[1]], x[[2]], as.numeric (x[[4]]), x[[5]], as.numeric (x[[6]]) ) })
stopCluster(cl)