DavisLaboratory / singscore

An R/Bioconductor package that implements a single-sample molecular phenotyping approach
https://davislaboratory.github.io/singscore/
40 stars 5 forks source link

Do we really need BiocParallel for generateNull()? #2

Closed MomenehForoutan closed 5 years ago

MomenehForoutan commented 6 years ago

I have been getting some errors using bplapplyfrom this package; something similar to what is raised up here "https://support.bioconductor.org/p/92587/". Would it be a problem if we use ldply() instead of bplapply(); here is what I would suggest: it basically gave the same results and it was even faster! I used that within a foreachloop though... Can you please check this on the example data?

generateNull <- function(n_up, n_down, rankData, B = 1000, seed = 1){
  all_genes <- rownames(rankData)
  totalNo <- n_up + n_down

  set.seed(seed)

  temSets <- lapply(1:B, function(x) {
    sample(all_genes, size = totalNo, replace = FALSE)
  })
  r <- ldply(1:B, function(x) {
    tms <- temSets[[x]]

    if (n_down > 0) {
      upSet <-  GeneSet(as.character(tms[1:n_up]))
      downSet <-  GeneSet(as.character(tms[-(1:n_up)]))
      ss = simpleScore(rankData, upSet = upSet, downSet = downSet)
    } else {
      #else all the random generated genes are in upSet
      ss = simpleScore(rankData, upSet = GeneSet(as.character(tms)))
    }
    ss[, 1]
  })
  colnames(r) <- colnames(rankData)
  return(r)
}
ruqianl commented 6 years ago

Hi Sepideh,

I just had a test run of the two versions (with and without BiocParallel), it was actually faster when using BiocParallel on my Mac Book.

Time difference of 36.55549 secs vs. Time difference of 1.126696 mins

I haven't tested on Windows, but I will look into that. Thanks!

ruqianl commented 6 years ago

I figured out what's causing that and I found a solution for that