cbg-ethz / TiMEx

Bioconductor package for finding mutually exclusive groups of alterations in large cancer datasets
4 stars 3 forks source link

Parallel processing with TiMEx? #5

Open Tato14 opened 5 years ago

Tato14 commented 5 years ago

Hi! I have a big matrice to test for mututal exclusive events with TiMEx. I would like to know if I could use parallel (https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf) to increase the performance of the mutual exclusive calculation.

Any ideas how to do it? Thanks!

csimona commented 5 years ago

Hi,

The R package parallel is definitely suited for running TiMEx on a large matrix. Here is an example on the pre-loaded ovarian dataset, for step 1 of the TiMEx procedure (the pairwise testing). Step 3 (the testing of candidates) can be implemented similarly.


# load necessary libraries and data and prepare for parallelization
library(TiMEx)
data(ovarian)
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
clusterExport(cl, "analyzePairs")
clusterExport(cl, "ovarian")

# indices for all combinations of genes of pairs
combs <- combn(ncol(ovarian),2)

# step 1 of the TiMEx procedure: analayze all pairs in parallel
start_time <- Sys.time()
mu.p.pairs  <- parApply(cl,combs, 2,function(x) {p <- analyzePairs(ovarian[,x]); q <- c(p$muEstSym[1,2],p$pvalueLRTCorrectSym$uncorrected[1,2]); return(q)})
end_time <- Sys.time()
end_time - start_time

# necessary to transform the input in a suitable form for the function doMaxCliques for step 2 below
pairs.parallel <- list("muEstSym" = matrix(, ncol(ovarian), ncol(ovarian)), "pvalueLRTCorrectSym" = list())

pairs.parallel$muEstSym[lower.tri(pairs.parallel$muEstSym, diag = FALSE)] <- mu.p.pairs[1,]
pairs.parallel$muEstSym <- pmax(pairs.parallel$muEstSym, t(pairs.parallel$muEstSym), na.rm = TRUE)
rownames(pairs.parallel$muEstSym) <- colnames(ovarian)
colnames(pairs.parallel$muEstSym) <- colnames(ovarian)

pairs.parallel$pvalueLRTCorrectSym$uncorrected <-  matrix(, ncol(ovarian), ncol(ovarian))
pairs.parallel$pvalueLRTCorrectSym$uncorrected[lower.tri(pairs.parallel$pvalueLRTCorrectSym$uncorrected, diag = FALSE)] <- mu.p.pairs[2,]
pairs.parallel$pvalueLRTCorrectSym$uncorrected <- pmax(pairs.parallel$pvalueLRTCorrectSym$uncorrected, t(pairs.parallel$pvalueLRTCorrectSym$uncorrected), na.rm = TRUE)

Time difference of 15.94725 secs