YuLab-SMU / DOSE

:mask: Disease Ontology Semantic and Enrichment analysis
https://yulab-smu.top/biomedical-knowledge-mining-book/
114 stars 35 forks source link

gseDO() example doesn't run #16

Closed guidohooiveld closed 7 years ago

guidohooiveld commented 7 years ago

I found a bug when running gseDO(). Reproducible example (from https://guangchuangyu.github.io/2016/07/leading-edge-analysis/ , except that I did not use the default "by='fgsea'").

Thanks for having a look at this. G

> library(DOSE)

> data(geneList)
> x <- gseDO(geneList,by="DOSE")
[1] "preparing geneSet collections..."
[1] "calculating observed enrichment scores..."
[1] "calculating permutation scores..."
Error: 'mcparallel' is not an exported object from 'namespace:parallel'
In addition: Warning message:
In MulticoreParam(workers, progressbar = verbose) :
  MulticoreParam() not supported on Windows, use SnowParam()
> sessionInfo()
R version 3.3.1 Patched (2016-10-18 r71535)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DOSE_3.0.0

loaded via a namespace (and not attached):
 [1] igraph_1.0.1         Rcpp_0.12.7          AnnotationDbi_1.36.0
 [4] magrittr_1.5         splines_3.3.1        BiocGenerics_0.20.0 
 [7] IRanges_2.8.0        munsell_0.4.3        BiocParallel_1.8.0  
[10] colorspace_1.2-7     fastmatch_1.0-4      stringr_1.1.0       
[13] plyr_1.8.4           tools_3.3.1          parallel_3.3.1      
[16] grid_3.3.1           Biobase_2.34.0       data.table_1.9.6    
[19] gtable_0.2.0         DBI_0.5-1            GOSemSim_2.0.0      
[22] gridExtra_2.2.1      reshape2_1.4.1       DO.db_2.9           
[25] ggplot2_2.1.0        S4Vectors_0.12.0     qvalue_2.6.0        
[28] RSQLite_1.0.0        stringi_1.1.2        fgsea_1.0.0         
[31] GO.db_3.4.0          scales_0.4.0         stats4_3.3.1        
[34] chron_2.3-47        
>
GuangchuangYu commented 7 years ago

I can't reproduce your issue.

> require(DOSE)
Loading required package: DOSE

> packageVersion("DOSE")
[1] ‘3.0.0’
> data(geneList)
> x = gseDO(geneList)
[1] "preparing geneSet collections..."
[1] "GSEA analysis..."
[1] "leading edge analysis..."
[1] "done..."
> head(x, 2)
                 ID            Description setSize enrichmentScore       NES
DOID:1492 DOID:1492 eye and adnexa disease     459      -0.3105160 -1.382493
DOID:5614 DOID:5614            eye disease     450      -0.3125247 -1.390034
               pvalue   p.adjust    qvalues rank                   leading_edge
DOID:1492 0.001230012 0.03797232 0.02535098 1793 tags=22%, list=14%, signal=19%
DOID:5614 0.001239157 0.03797232 0.02535098 1768 tags=22%, list=14%, signal=19%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     core_enrichment
DOID:1492 3371/3082/5914/2878/4153/3791/23247/1543/80184/6750/1958/2098/7450/596/9187/2034/482/948/1490/1280/3931/5737/4314/4881/2261/3426/187/629/6403/7042/6785/7507/2934/5176/4060/1277/7078/5950/2057/727/10516/4311/2247/1295/358/10203/2192/582/10218/57125/3485/585/1675/6310/2202/4313/2944/4254/3075/1501/2099/3480/4653/1195/6387/3305/1471/857/4016/1909/4053/6678/1296/7033/4915/55812/1191/5654/10631/2152/2697/7043/2952/6935/2200/3572/7177/7031/3479/2006/10451/9370/771/3117/125/652/4693/5346/1524
DOID:5614           3082/5914/2878/4153/3791/23247/1543/80184/6750/1958/2098/7450/596/9187/2034/482/948/1490/1280/3931/5737/4314/4881/2261/3426/187/629/6403/7042/6785/7507/2934/5176/4060/1277/7078/5950/2057/727/10516/4311/2247/1295/358/10203/2192/582/10218/57125/3485/585/1675/6310/2202/4313/2944/4254/3075/1501/2099/3480/4653/6387/3305/1471/857/4016/1909/4053/6678/1296/7033/4915/55812/1191/5654/10631/2152/2697/7043/2952/6935/2200/3572/7177/7031/3479/2006/10451/9370/771/3117/125/652/4693/5346/1524

From your error msg: Error: 'mcparallel' is not an exported object from 'namespace:parallel', it seems that DOSE import mcparallel from parallel, but actually I never import it, https://github.com/GuangchuangYu/DOSE/blob/master/NAMESPACE.

For parallel, DOSE use BiocParallel package as backend, which also not import mcparallel, see https://github.com/Bioconductor/BiocParallel/blob/master/NAMESPACE.

guidohooiveld commented 7 years ago

Thanks for having a look at this. I got your point. Unfortunately, exact same error still persists, even when running a freshly installed R v3.3.1 (thus not patched v3.3.1). I only ran the R-installer, followed by biocLite("DOSE"). That was all. Could this be a Win-specific problem?

GuangchuangYu commented 7 years ago

It could be. You can file an issue of BiocParallel in bioconductor support site.

guidohooiveld commented 7 years ago

OK, will do so.

mtmorgan commented 7 years ago

The DOSE code at https://github.com/GuangchuangYu/DOSE/blob/8b57eec7965d58fc7c84a63d6383707484803aeb/R/gsea.R#L211 does

bpstart(MulticoreParam())

which generates first the warning and then error (use options(warn=1) to seem them in correct temporal sequence) -- it is trying to start a multicore back-end on unsupported windows.

Error: 'mcparallel' is not an exported object from 'namespace:parallel'
In addition: Warning message:
In MulticoreParam(workers, progressbar = verbose) :
  MulticoreParam() not supported on Windows, use SnowParam()

The intention is to let the user specify the back-end either implicitly or by register()ing the desired back-end. So code use might be

if (!bpisup()) {
    bpstart()
   on.exit(bpstop())
}
bplapply(1:nperm,  function(i) {
        if (seed)
            set.seed(seeds[i])
        perm.gseaEScore(geneList, selected.gs, exponent)
})

(or just bplapply(), since the cluster is used only for the evaluation).

GuangchuangYu commented 7 years ago

@mtmorgan thanks for pointing it out.

@guidohooiveld pls test with the github version, it should be fixed.

guidohooiveld commented 7 years ago

@GuangchuangYu: yes, working now (although you may want to check the message that are returned).

> packageVersion("DOSE")
[1] ‘3.1.1’
>

> library(DOSE)

> data(geneList)
> x <- gseDO(geneList,by="DOSE")
preparing geneSet collections...
calculating observed enrichment scores...
calculating permutation scores...
              <environment: namespace:base>
            cpu
            elapsed
            transient
              <environment: namespace:base>
            expr
            ...
              <environment: namespace:DOSE>
            geneList
            geneSets
            exponent
            [1]
          i
            [2]
              <environment: namespace:DOSE>
            geneList
              <environment: namespace:DOSE>
            geneList
            geneSet
            exponent
            fortify
              <environment: namespace:base>
            x
            row.names
            optional
            ...
            nm
        width.cutoff
          collapse
              <environment: namespace:base>
            x
            name
            value
              <environment: namespace:base>
            cpu
            elapsed
            transient
              <environment: namespace:base>
            expr
            ...
              <environment: namespace:DOSE>
            geneList
            geneSets
            exponent
            [1]
          i
            [2]
              <environment: namespace:DOSE>
            geneList
              <environment: namespace:DOSE>
            geneList
            geneSet
            exponent
            fortify
              <environment: namespace:base>
            x
            row.names
            optional
            ...
            nm
        width.cutoff
          collapse
              <environment: namespace:base>
            x
            name
            value
calculating p values...
leading edge analysis...
done...
> head(x,2)
                       ID                      Description setSize enrichmentScore
DOID:5679       DOID:5679                  retinal disease     299      -0.3676313
DOID:0060040 DOID:0060040 pervasive developmental disorder     175      -0.3909104
                   NES      pvalue   p.adjust    qvalues rank
DOID:5679    -1.575727 0.001288660 0.03595402 0.02395644 1768
DOID:0060040 -1.570647 0.001404494 0.03595402 0.02395644 2308
                               leading_edge
DOID:5679    tags=24%, list=14%, signal=21%
DOID:0060040 tags=30%, list=18%, signal=25%
                                                                                                                                                                                                                                                                                                                                                                  core_enrichment
DOID:5679    2878/3791/23247/80184/6750/7450/596/9187/2034/482/948/1490/1280/5737/4314/4881/3426/187/629/6403/6785/2934/5176/7078/5950/727/10516/4311/2247/1295/358/10203/582/10218/57125/585/1675/6310/2202/4313/2944/4254/3075/2099/3480/4653/6387/1471/857/4016/1909/4053/6678/1296/4915/55812/1191/5654/10631/2697/2952/6935/2200/3479/2006/10451/9370/771/652/4693/5346/1524
DOID:0060040                                                                                                 1760/9732/7337/5175/6532/54806/6326/1499/7157/221037/627/3399/2571/3082/3791/27347/596/22829/23426/324/5021/4885/7248/8604/3397/4208/3400/26470/64221/9037/3952/93664/2944/7102/2550/4915/4922/26960/1746/2697/6863/2891/367/4128/7166/6505/5348/18/9370/57502/79083
>
GuangchuangYu commented 7 years ago

Now I only use:

https://github.com/GuangchuangYu/DOSE/blob/master/R/gsea.R#L217-L221

    permScores <- bplapply(1:nPerm, function(i) {
        if (seed)
            set.seed(seeds[i])
        perm.gseaEScore(geneList, selected.gs, exponent)
})

and let bpparam() to select proper BPPARAM (registered()[[1]]).

I can't reproduce the unexpected message on Mac OSX.

@mtmorgan Is this an issue of BiocParallel on Windows?

To my knowledge, it may due to importing function improperly. This is quite likely, as BiocParallel use :: and ::: quite often.

~/g/o/B/R ❯❯❯ cat * |grep '::'  
            prev.config <- BatchJobs::getConfig()
            on.exit(do.call(BatchJobs::setConfig, prev.config))
            new.conf <- unclass(do.call(BatchJobs::setConfig,
setMethod("bpbackend", "BatchJobsParam", function(x) BatchJobs::getConfig())
    prev.config <- BatchJobs::getConfig()
    on.exit(BatchJobs::setConfig(conf=prev.config), add=TRUE)
    BatchJobs::setConfig(conf=BPPARAM$conf.pars)
        reg <- do.call(BatchJobs::makeRegistry, reg.pars)
        ids <- BatchJobs::batchMap(
        } else BBmisc::chunk(ids, n.chunks=bpnworkers(BPPARAM), shuffle=TRUE)
        do.call(BatchJobs::submitJobs, submit.pars)
        BatchJobs::waitForJobs(reg, ids, timeout=30L * 24L * 60L * 60L,
        BatchJobs::loadResults(reg, ids, use.names="none")
        foreach::getDoParWorkers()
    isNamespaceLoaded("foreach") && foreach::getDoParRegistered() &&
        (foreach::getDoParName() != "doSEQ") &&
        (foreach::getDoParWorkers() > 1L)
    `%dopar%` <- foreach::`%dopar%`
        foreach::foreach(X=X, .errorhandling=handle) %dopar% FUN(X, ...)
            max(1L, parallel::detectCores() - 2L)
    cores <- max(1L, parallel::detectCores() - 2L)
        bpbackend(x) <- do.call(parallel::makeCluster, cargs)
            parallel::clusterSetRNGStream(bpbackend(x), bpRNGseed(x))
            parallel::clusterExport(bpbackend(x), "timeout", env=environment())
        parallel::stopCluster(bpbackend(x))
        BatchJobs:::checkDir(bpresultdir(BPPARAM))
        BatchJobs:::checkDir(bplogdir(BPPARAM))
        BatchJobs:::checkDir(bpresultdir(BPPARAM))
        BatchJobs:::checkDir(bplogdir(BPPARAM))
### parallel::SOCKcluster types
    parallel:::stopCluster.default
### snow::MPI
    Rmpi::mpi.comm.get.parent(intercomm)
    Rmpi::mpi.intercomm.merge(intercomm,1,comm)
    Rmpi::mpi.comm.set.errhandler(comm)
    Rmpi::mpi.comm.disconnect(intercomm)
    bploop(snow::makeMPImaster(comm))
    Rmpi::mpi.comm.disconnect(comm)
    Rmpi::mpi.quit()
### parallel::FORK
    parallel::mcparallel({
        parallel::clusterApply(cl, seq_along(cl), .bufferload,
    parallel:::sendData(node, data)
    parallel:::sendData(node, data)
        parallel:::recvData(node)
        parallel:::recvOneData(cluster)
    parallel:::closeNode(node)
## bploop.lapply(): derived from snow::dynamicClusterApply.
## Derived from snow::dynamicClusterApply and parallel::mclapply.
    unknown <- codetools::findGlobals(fun)
    futile.logger::flog.threshold(threshold)
    futile.logger::flog.info("loading futile.logger package")
    futile.logger::flog.appender(.log_buffer_append, 'ROOT')
        futile.logger::flog.warn(fmt, ...)
        futile.logger::flog.error(fmt, ...)
## derived from plyr::progress_text()
    multicore <- (parallel::detectCores() - 2L) > 1L
~/g/o/B/R ❯❯❯ cat * |grep ':::' 
        BatchJobs:::checkDir(bpresultdir(BPPARAM))
        BatchJobs:::checkDir(bplogdir(BPPARAM))
        BatchJobs:::checkDir(bpresultdir(BPPARAM))
        BatchJobs:::checkDir(bplogdir(BPPARAM))
    parallel:::stopCluster.default
    parallel:::sendData(node, data)
    parallel:::sendData(node, data)
        parallel:::recvData(node)
        parallel:::recvOneData(cluster)
    parallel:::closeNode(node)
mtmorgan commented 7 years ago

If you register(SnowParam()) you'll be getting the same backend as on Windows, this might help to emulate the problem Guido is reporting. The output looks something like ls() or ??? It could be some left-over de-bugging code in BiocParallel, but I'm not able to reproduce it outside of DOSE.

guidohooiveld commented 7 years ago

For the archive: I noticed this issue has been resolved in DOSE v3.0.4, which also was confirmed by Guangchuang here: https://github.com/GuangchuangYu/ReactomePA/issues/11#issuecomment-260289422