Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
67 stars 29 forks source link

bplapply: cryptic output on windows #70

Closed lgeistlinger closed 6 years ago

lgeistlinger commented 6 years ago

I"m carrying out a differential expression analysis with EnrichmentBrowser::de.ana on the airway dataset under Windows (sessionInfo at the end of this post). This works fine:

# setup
> library(EnrichmentBrowser)
> data(airway, package="airway")

# define sample groups according to treatment
> airway$GROUP <- ifelse(airway$dex == "trt", 1, 0)

# single run ... 
> airway <- de.ana(airway)

# ... works fine
> rowData(airway)
DataFrame with 12937 rows and 3 columns
               FC   ADJ.PVAL  limma.STAT
        <numeric>  <numeric>   <numeric>
1     -0.39678452  0.1175037  -2.6289729
2      0.18155416  0.1604414   2.3242126
3      0.01856915  0.9281188   0.1901358
4     -0.08166115  0.8860783  -0.2907636
5      0.39335015  0.3680498   1.4834379
...           ...        ...         ...
12933 -0.10790287 0.66892152 -0.75516714
12934  0.48963694 0.01590157  4.77114738
12935 -0.01803423 0.97715363 -0.06459359
12936 -0.21721561 0.49313085 -1.16206137
12937 -0.03015531 0.95643603 -0.11880231

However, when carrying this out in parallel via:

> elist <- rep(list(airway), 5)
> elist <- BiocParallel::bplapply(elist, de.ana)

I"m getting some cryptic (and super long) output on stdout:

 <environment: namespace:base>
          cpu
          elapsed
          transient
        class
            <environment: namespace:EnrichmentBrowser>
          key
          value
            <environment: namespace:stats>
          x
          ...
        class
            <environment: namespace:EnrichmentBrowser>
          expr
          na.rm
            <environment: namespace:EnrichmentBrowser>
          pkg
          type
  all.available
  suppressUpdates
  suppressAutoUpdate
          character.only
            <environment: namespace:stats>
          object
          ...
        class
            <environment: namespace:stats>
          p
          method
          n
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
            <environment: namespace:compiler>
          start.op
          dflt.op
          afun
          place
          call
          cb
          cntxt
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
        class
            <environment: namespace:S4Vectors>
          ...
          row.names
          check.names
silent
use.names
length.out
drop
    recursive
    use.names
  unique
          listData
          rownames
          nrows
          check
            <environment: namespace:S4Vectors>
          i
          x
          exact
          error.if.nomatch
    incomparables
    duplicates.ok
            <environment: namespace:S4Vectors>
          i
          x
          exact
          allow.append
          allow.NAs
          as.NSBS
    exact
    strict.upper.bound
    allow.NAs
    exact
    strict.upper.bound
    allow.NAs
            <environment: namespace:S4Vectors>
          subscript
          upper_bound
          upper_bound_is_strict
          has_NAs
            subscript
            upper_bound
            upper_bound_is_strict
            has_NAs
            check
            <environment: namespace:base>
          expr
        class
        class
          [1]
        w
          [2]
    class
          [3]
        class
        class
            <environment: namespace:base>
          ...
        class
        class
            <environment: namespace:S4Vectors>
          debug
      envir
          envir
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
    collapse
    collapse
collapse
            <environment: namespace:S4Vectors>
          x
            PACKAGE
            <environment: namespace:S4Vectors>
          x
    collapse
            <environment: namespace:S4Vectors>
          x
    collapse
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
xi
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:S4Vectors>
          x
            <environment: namespace:stats>
          x
          env
          ...
        class
        class
        class
        class
            <environment: namespace:stats>
...

How to get rid of this?

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] EnrichmentBrowser_2.8.6    pathview_1.18.2           
 [3] org.Hs.eg.db_3.5.0         AnnotationDbi_1.40.0      
 [5] graph_1.56.0               SummarizedExperiment_1.8.1
 [7] DelayedArray_0.4.1         matrixStats_0.53.1        
 [9] Biobase_2.38.0             GenomicRanges_1.30.1      
[11] GenomeInfoDb_1.14.0        IRanges_2.12.0            
[13] S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2              hwriter_1.3.2                
  [3] biovizBase_1.26.0             htmlTable_1.11.2             
  [5] XVector_0.18.0                base64enc_0.1-3              
  [7] dichromat_2.0-0               rstudioapi_0.7               
  [9] bit64_0.9-7                   interactiveDisplayBase_1.16.0
 [11] R.methodsS3_1.7.1             splines_3.4.3                
 [13] ggbio_1.26.0                  geneplotter_1.56.0           
 [15] knitr_1.19                    Formula_1.2-2                
 [17] Rsamtools_1.30.0              annotate_1.56.1              
 [19] cluster_2.0.6                 GO.db_3.5.0                  
 [21] R.oo_1.21.0                   png_0.1-7                    
 [23] shiny_1.0.5                   compiler_3.4.3               
 [25] httr_1.3.1                    GOstats_2.44.0               
 [27] backports_1.1.2               assertthat_0.2.0             
 [29] Matrix_1.2-12                 lazyeval_0.2.1               
 [31] limma_3.34.8                  acepack_1.4.1                
 [33] htmltools_0.3.6               prettyunits_1.0.2            
 [35] tools_3.4.3                   gtable_0.2.0                 
 [37] GenomeInfoDbData_1.0.0        Category_2.44.0              
 [39] reshape2_1.4.3                rappdirs_0.3.1               
 [41] Rcpp_0.12.15                  Biostrings_2.46.0            
 [43] rtracklayer_1.38.3            stringr_1.2.0                
 [45] mime_0.5                      ensembldb_2.2.1              
 [47] XML_3.98-1.9                  AnnotationHub_2.10.1         
 [49] edgeR_3.20.8                  zlibbioc_1.24.0              
 [51] scales_0.5.0                  BSgenome_1.46.0              
 [53] BiocInstaller_1.28.0          VariantAnnotation_1.24.5     
 [55] ProtGenerics_1.10.0           RBGL_1.54.0                  
 [57] KEGGgraph_1.38.0              AnnotationFilter_1.2.0       
 [59] RColorBrewer_1.1-2            curl_3.1                     
 [61] yaml_2.1.16                   memoise_1.1.0                
 [63] gridExtra_2.3                 ggplot2_2.2.1                
 [65] biomaRt_2.34.2                rpart_4.1-12                 
 [67] reshape_0.8.7                 latticeExtra_0.6-28          
 [69] stringi_1.1.6                 RSQLite_2.0                  
 [71] genefilter_1.60.0             RMySQL_0.10.13               
 [73] checkmate_1.8.5               GenomicFeatures_1.30.3       
 [75] BiocParallel_1.12.0           ReportingTools_2.17.3        
 [77] rlang_0.1.6                   pkgconfig_2.0.1              
 [79] bitops_1.0-6                  lattice_0.20-35              
 [81] GenomicAlignments_1.14.1      htmlwidgets_1.0              
 [83] bit_1.1-12                    GSEABase_1.40.1              
 [85] AnnotationForge_1.20.0        GGally_1.3.2                 
 [87] plyr_1.8.4                    magrittr_1.5                 
 [89] DESeq2_1.18.1                 R6_2.2.2                     
 [91] snow_0.4-2                    Hmisc_4.1-1                  
 [93] DBI_0.7                       pillar_1.1.0                 
 [95] foreign_0.8-69                survival_2.41-3              
 [97] KEGGREST_1.18.1               RCurl_1.95-4.10              
 [99] nnet_7.3-12                   tibble_1.4.2                 
[101] OrganismDbi_1.20.0            PFAM.db_3.5.0                
[103] progress_1.1.2                locfit_1.5-9.1               
[105] grid_3.4.3                    data.table_1.10.4-3          
[107] blob_1.1.0                    Rgraphviz_2.22.0             
[109] digest_0.6.15                 xtable_1.8-2                 
[111] httpuv_1.3.5                  R.utils_2.6.0                
[113] munsell_0.4.3                
mtmorgan commented 6 years ago

Sorry to be slow in getting to this. Do you still see the problem? Can you be more specific about how you are running this code? Also, is there a minimal example that you can provide, there are many more packages in your sessionInfo() than implied by your code?

For instance, I have a script BiocParallel-issue-70.R

data("airway", package="airway")
airway$GROUP <- ifelse(airway$dex == "trt", 1, 0)

elist <- rep(list(airway), 5)
myAna <- function(x) {
    library(SummarizedExperiment)  ## to tackle a separate problem; mentioned below
    EnrichmentBrowser::deAna(x)
}
elist <- bptry(BiocParallel::bplapply(elist, myAna))
elist
sessionInfo()

And I run it from the PowerShell command line as

PS > R.exe -f .\BiocParallel-issue-70.R
...
> sessionInfo()
R version 3.5.1 RC (2018-06-24 r74929)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 R2 x64 (build 9600)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] SummarizedExperiment_1.11.5 DelayedArray_0.7.15
 [3] BiocParallel_1.15.6         matrixStats_0.53.1
 [5] Biobase_2.41.1              GenomicRanges_1.33.6
 [7] GenomeInfoDb_1.17.1         IRanges_2.15.14
 [9] S4Vectors_0.19.16           BiocGenerics_0.27.1

loaded via a namespace (and not attached):
 [1] lattice_0.20-35        snow_0.4-2             bitops_1.0-6
 [4] grid_3.5.1             zlibbioc_1.27.0        XVector_0.21.3
 [7] Matrix_1.2-14          tools_3.5.1            RCurl_1.95-4.10
[10] compiler_3.5.1         GenomeInfoDbData_1.1.0
>

myAna() ensures that SummarizedExperiment is on the search path, otherwise the workers see the equivalent of (in a new R session)

> loadNamespace("SummarizedExperiment")  # load but not attach
<environment: namespace:SummarizedExperiment>
> data("airway", package="airway")       # happy that SE is loaded
> airway[1:5,]    # `[,SummarizedExperiment` not found -- not on search path
Error in x@assays[ii, ] : object of type 'S4' is not subsettable
lgeistlinger commented 6 years ago

Using R-3.5.0 and Bioconductor 3.7, I'm not having this issue anymore. Thanks!