Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
67 stars 29 forks source link

Parallelization problems on Ubuntu #106

Open c-mertes opened 4 years ago

c-mertes commented 4 years ago

I'm not sure where the problem is and maybe you can help finding the root of it.

When using OUTRIDER on a CentOS 7.7 machine MulticoreParam works perfectly. But on Ubuntu it stucks after the first bplapply call and does not return anything. When using SerialParam it goes through normally on both CentOS and Ubuntu. It does not matter if we use Multicore or Snow. Coult the the multithreaded BLAS and LAPACK versions from Ubuntu cause the problem?

For more details have a look here https://github.com/gagneurlab/OUTRIDER/issues/18

This is how to reproduce the problem on my laptop:

if (!requireNamespace("OUTRIDER", quietly=TRUE))
    BiocManager::install("OUTRIDER")

library(OUTRIDER)
download.file("https://github.com/gagneurlab/OUTRIDER/files/3898551/all_fib_cts.gz", "cts.gz")
ods <- OutriderDataSet(countData=read.table("cts.gz"))
ods <-  filterExpression(ods, minCounts=TRUE)

register(MulticoreParam(2, 20, progressbar=TRUE))
ods <- OUTRIDER(ods, verbose=TRUE)

Thanks for any help.

mtmorgan commented 4 years ago

I'm not able to reproduce this. To troubleshoot I'd aim for a simpler example, e.g.,

bplapply(1:5, identity, BPPARAM = MulticoreParam()

with a likely culprit being blocked ports (BiocParallel spawns workers who communicate with the master through sockets; see manager.port on ?MulticoreParam

lshep commented 4 years ago

@c-mertes - If we cannot reproduce this and you can not provide more details we will close the issue - Were you still encountering this and is it possible to provide a simpler example as requested?

HenrikBengtsson commented 4 years ago

I can reproduce this on "stock" R 3.6.2 on Ubuntu 18.04 with both MulticoreParam() and SnowParam();

> ods <- OUTRIDER(ods, verbose=TRUE)
Wed Jan 22 06:55:34 2020: SizeFactor estimation ...
Wed Jan 22 06:55:35 2020: Controlling for confounders ...
Using estimated q with: 45
Wed Jan 22 06:55:35 2020: Using the autoencoder implementation for controlling.
  |                                                                      |   0%

If you look at top/htop, you'll see that both of the two forked child processes are indeed running but they're running at 100% on each of your cores (in my case I've got 8 cores so at 800%). ELI5: The forked multicore workers are running beyond wild trying to get timeslots on the CPU, which just can't keep up and you end up clogging up the OS trying switch between way too many threads. I'm pretty sure this is due to multi-threading, which typically is due to OpenMP multi-threading is used by some native code - something that becomes more and more common these days in R as it is easier and easier for developers to implement this via the Rcpp ecosystem.

Sure enough, if we force single-threaded OpenMP(*):

RhpcBLASctl::omp_set_num_threads(1L)
register(MulticoreParam(workers=2L, tasks=20L, progressbar=TRUE))
ods <- OUTRIDER(ods, verbose=TRUE)
  |=================================================                     |  70%

It might also work with, say, RhpcBLASctl::omp_set_num_threads(2L).

(*) You might have to restart R first.

The above approach to force single-threaded OpenMP will only work with MulticoreParam - for SnowParam the RhpcBLASctl::omp_set_num_threads(1L) call has to be called within every worker. Not sure how to do that in BiocParallel.

Either way, the above is a problem that we will see popping in more and more code. It will appear "randomly" as more and more packages start parallelizing. The main problem is that developers think they have full access to all cores on the machine, which often stems from using parallel::detectCores() [<<== BAD] or similar (here it's something similar in OpenMP) to decide on the number of workers or number of threads. It is effectively left to the end-user to troubleshoot and deal with this. What makes it worse, it's very hard for the user to disable this overuse of the CPU (recently I discovered that RhpcBLASctl::omp_set_num_threads(1L) might not do work on all platforms/R builds).

As a starter, I think OUTRIDER needs to document this and provide mechanisms/options/arguments for running in single-threaded mode.

There's probably also room for BioParallel to do something here, e.g. documentation, collect problematic examples, educate developers, don't use parallel::detectCores(), etc.

In the bigger picture, I think Bioconductor and CRAN need to work together to detect cases of this through their R CMD check:s and report back to developers. Without protection against this, this problem will become more common rather soon. I also think there should be some built-in protection against this in base R, or at least user and developer options for disabling multi-processing/multi-forking/multi-threading.

Reproducible example

One time setup:

if (!requireNamespace("OUTRIDER", quietly=TRUE))
    BiocManager::install("OUTRIDER")
if (!utils::file_test("-f", "cts.gz"))
    download.file("https://github.com/gagneurlab/OUTRIDER/files/3898551/all_fib_cts.gz", "cts.gz", mode = "wb")
library(OUTRIDER)
counts <- read.table("cts.gz")
ods <- OutriderDataSet(countData=counts)
ods <- filterExpression(ods, minCounts=TRUE)

register(MulticoreParam(workers=2L, tasks=20L, progressbar=TRUE))
#register(SnowParam(workers=2L, tasks=20L, type="SOCK", progressbar=TRUE))
#register(SerialParam(progressbar=TRUE))
ods <- OUTRIDER(ods, verbose=TRUE)

Session info

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] OUTRIDER_1.4.0              data.table_1.12.8          
 [3] SummarizedExperiment_1.16.1 DelayedArray_0.12.2        
 [5] matrixStats_0.55.0-9000     GenomicFeatures_1.38.0     
 [7] AnnotationDbi_1.48.0        Biobase_2.46.0             
 [9] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
[11] IRanges_2.20.2              S4Vectors_0.24.3           
[13] BiocGenerics_0.32.0         BiocParallel_1.20.1        

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1         htmlTable_1.13.3         XVector_0.26.0          
  [4] base64enc_0.1-3          rstudioapi_0.10          bit64_0.9-7             
  [7] codetools_0.2-16         splines_3.6.2            PRROC_1.3.1             
 [10] geneplotter_1.64.0       knitr_1.27               zeallot_0.1.0           
 [13] Formula_1.2-3            jsonlite_1.6             Rsamtools_2.2.1         
 [16] annotate_1.64.0          cluster_2.1.0            dbplyr_1.4.2            
 [19] png_0.1-7                pheatmap_1.0.12          compiler_3.6.2          
 [22] httr_1.4.1               backports_1.1.5          assertthat_0.2.1        
 [25] Matrix_1.2-18            lazyeval_0.2.2           acepack_1.4.1           
 [28] htmltools_0.4.0          prettyunits_1.1.0        tools_3.6.2             
 [31] gtable_0.3.0             glue_1.3.1               GenomeInfoDbData_1.2.2  
 [34] dplyr_0.8.3              rappdirs_0.3.1           Rcpp_1.0.3              
 [37] vctrs_0.2.1              Biostrings_2.54.0        gdata_2.18.0            
 [40] rtracklayer_1.46.0       iterators_1.0.12         xfun_0.12               
 [43] stringr_1.4.0            lifecycle_0.1.0          gtools_3.8.1            
 [46] XML_3.99-0.3             dendextend_1.13.2        MASS_7.3-51.5           
 [49] zlibbioc_1.32.0          scales_1.1.0             TSP_1.1-7               
 [52] pcaMethods_1.78.0        hms_0.5.3                RColorBrewer_1.1-2      
 [55] BBmisc_1.11              curl_4.3                 memoise_1.1.0           
 [58] heatmaply_1.0.0          gridExtra_2.3            ggplot2_3.2.1           
 [61] biomaRt_2.42.0           rpart_4.1-15             latticeExtra_0.6-29     
 [64] stringi_1.4.5            RSQLite_2.2.0            genefilter_1.68.0       
 [67] gclus_1.3.2              foreach_1.4.7            checkmate_1.9.4         
 [70] seriation_1.2-8          caTools_1.18.0           rlang_0.4.2             
 [73] pkgconfig_2.0.3          bitops_1.0-6             lattice_0.20-38         
 [76] purrr_0.3.3              GenomicAlignments_1.22.1 htmlwidgets_1.5.1       
 [79] bit_1.1-15.1             tidyselect_0.2.5         plyr_1.8.5              
 [82] magrittr_1.5             DESeq2_1.26.0            R6_2.4.1                
 [85] gplots_3.0.1.2           Hmisc_4.3-0              DBI_1.1.0               
 [88] pillar_1.4.3             foreign_0.8-75           survival_3.1-8          
 [91] RCurl_1.98-1.1           nnet_7.3-12              tibble_2.1.3            
 [94] crayon_1.3.4             KernSmooth_2.23-16       BiocFileCache_1.10.2    
 [97] plotly_4.9.1             viridis_0.5.1            jpeg_0.1-8.1            
[100] progress_1.2.2           locfit_1.5-9.1           grid_3.6.2              
[103] blob_1.2.1               digest_0.6.23            webshot_0.5.2           
[106] xtable_1.8-4             tidyr_1.0.0              openssl_1.4.1           
[109] munsell_0.5.0            registry_0.5-1           viridisLite_0.3.0       
[112] askpass_1.1

See also

mxblsdl commented 4 years ago

I am trying to diagnose a problem with running parallel on Ubuntu and I think this may be related. I have a function that uses data.table to perform a number of calculations and when run this in parallel with future.lapply and look at top I see all of the R sessions running at +500% CPU.

I know data.table runs multithread by default and I was wondering if this could be the cause of the CPU overage. Using plan(multisession) to set the future

Session Info


R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tictoc_1.0         sf_0.8-0           future.apply_1.4.0 future_1.15.1      data.table_1.12.8  raster_3.0-7       sp_1.3-2           optparse_1.6.4    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         compiler_3.6.2     pillar_1.4.3       class_7.3-15       tools_3.6.2        zeallot_0.1.0      digest_0.6.23      tibble_2.1.3      
 [9] lattice_0.20-38    pkgconfig_2.0.3    rlang_0.4.2        DBI_1.1.0          cli_1.1.0          rstudioapi_0.10    yaml_2.2.0         parallel_3.6.2    
[17] rgdal_1.4-8        e1071_1.7-3        vctrs_0.2.0        globals_0.12.5     classInt_0.4-1     grid_3.6.2         getopt_1.20.3      listenv_0.8.0     
[25] fansi_0.4.1        magrittr_1.5       backports_1.1.5    codetools_0.2-16   units_0.6-5        assertthat_0.2.1   KernSmooth_2.23-16 utf8_1.1.4        
[33] crayon_1.3.4   
c-mertes commented 4 years ago

thanks @HenrikBengtsson this was really helpful. I can confirm that the RhpcBLASctl::omp_set_num_threads(1L) did the trick for us and now it is also running through on my WSL.

Since we do use RcppArmadillo in our optimization, an alternative workaround would be to compile the package with the flags -DARMA_DONT_USE_OPENMP. This way we do not have to care anymore about Snow or Multicore and how the enduser parallelize. But on the other hand we lose the parallelization if we are in serial mode.

@lshep here is a smaller example what is going wrong. Its still uses OUTRIDER and its internal c function. But if needed I could try to write a more simpler cpp function. In the end the c functions does some matrix multiplications and some element wise operations, which are parallelized with openMP, and returns a single value.


# load BiocParallel
library(BiocParallel)

# create example data
q <- 40
n <- 200
m <- 20000

b     <- abs(rnorm(m))
D     <- matrix(rnorm((q)*m), nrow=m)
k     <- matrix(rnbinom(n*m, 10, mu=400), nrow=n)
theta <- abs(rnorm(m))
mask  <- matrix(1, nrow=m, ncol=n)
sf    <- abs(rnorm(n, mean=1))
H     <- matrix(rnorm(q*n), ncol=q)

# Serial with 1 openMP thread works
RhpcBLASctl::omp_set_num_threads(1L)
BPPARAM <- SerialParam(progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

# Serial with 10 openMP threads works
RhpcBLASctl::omp_set_num_threads(10L)
BPPARAM <- SerialParam(progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

# Multicore with 1 openMP thread works
RhpcBLASctl::omp_set_num_threads(1L)
BPPARAM <- MulticoreParam(4, 40, progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

# Multicore with 10 openMP threads does not work
RhpcBLASctl::omp_set_num_threads(10L)
BPPARAM <- MulticoreParam(4, 40, progr=TRUE)
bplapply(seq_along(b), BPPARAM=BPPARAM, function(i) { 
    OUTRIDER:::truncLogLiklihoodD(par = c(b[i], D[i,]), H=H, k = k[,i], sf = sf,
            exclusionMask = mask[i,], theta = theta[i], thetaC = mask)})

And my R session is:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.20.1

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1            htmlTable_1.13.3            XVector_0.26.0             
  [4] GenomicRanges_1.38.0        base64enc_0.1-3             rstudioapi_0.10            
  [7] bit64_0.9-7                 AnnotationDbi_1.48.0        codetools_0.2-16           
 [10] splines_3.6.2               PRROC_1.3.1                 geneplotter_1.64.0         
 [13] knitr_1.27                  Formula_1.2-3               jsonlite_1.6               
 [16] Rsamtools_2.2.1             RhpcBLASctl_0.20-17         annotate_1.64.0            
 [19] cluster_2.0.9               OUTRIDER_1.4.0              dbplyr_1.4.2               
 [22] png_0.1-7                   pheatmap_1.0.12             compiler_3.6.2             
 [25] httr_1.4.1                  backports_1.1.5             assertthat_0.2.1           
 [28] Matrix_1.2-18               lazyeval_0.2.2              acepack_1.4.1              
 [31] htmltools_0.4.0             prettyunits_1.1.1           tools_3.6.2                
 [34] gtable_0.3.0                glue_1.3.1                  GenomeInfoDbData_1.2.2     
 [37] dplyr_0.8.3                 rappdirs_0.3.1              Rcpp_1.0.3                 
 [40] Biobase_2.46.0              vctrs_0.2.2                 Biostrings_2.54.0          
 [43] gdata_2.18.0                rtracklayer_1.46.0          iterators_1.0.12           
 [46] xfun_0.12                   stringr_1.4.0               lifecycle_0.1.0            
 [49] gtools_3.8.1                XML_3.99-0.3                dendextend_1.13.2          
 [52] MASS_7.3-51.4               zlibbioc_1.32.0             scales_1.1.0               
 [55] TSP_1.1-8                   pcaMethods_1.78.0           hms_0.5.3                  
 [58] parallel_3.6.2              SummarizedExperiment_1.16.1 RColorBrewer_1.1-2         
 [61] BBmisc_1.11                 curl_4.3                    memoise_1.1.0              
 [64] heatmaply_1.0.0             gridExtra_2.3               ggplot2_3.2.1              
 [67] biomaRt_2.42.0              rpart_4.1-15                latticeExtra_0.6-29        
 [70] stringi_1.4.5               RSQLite_2.2.0               genefilter_1.68.0          
 [73] gclus_1.3.2                 S4Vectors_0.24.3            foreach_1.4.7              
 [76] checkmate_1.9.4             seriation_1.2-8             GenomicFeatures_1.38.1     
 [79] caTools_1.18.0              BiocGenerics_0.32.0         GenomeInfoDb_1.22.0        
 [82] rlang_0.4.3                 pkgconfig_2.0.3             matrixStats_0.55.0         
 [85] bitops_1.0-6                lattice_0.20-38             purrr_0.3.3                
 [88] GenomicAlignments_1.22.1    htmlwidgets_1.5.1           bit_1.1-15.1               
 [91] tidyselect_1.0.0            plyr_1.8.5                  magrittr_1.5               
 [94] DESeq2_1.26.0               R6_2.4.1                    IRanges_2.20.2             
 [97] gplots_3.0.1.2              Hmisc_4.3-0                 DelayedArray_0.12.2        
[100] DBI_1.1.0                   pillar_1.4.3                foreign_0.8-71             
[103] survival_2.44-1.1           RCurl_1.98-1.1              nnet_7.3-12                
[106] tibble_2.1.3                crayon_1.3.4                KernSmooth_2.23-16         
[109] BiocFileCache_1.10.2        plotly_4.9.1                viridis_0.5.1              
[112] jpeg_0.1-8.1                progress_1.2.2              locfit_1.5-9.1             
[115] grid_3.6.2                  data.table_1.12.8           blob_1.2.1                 
[118] digest_0.6.23               webshot_0.5.2               xtable_1.8-4               
[121] tidyr_1.0.2                 openssl_1.4.1               stats4_3.6.2               
[124] munsell_0.5.0               registry_0.5-1              viridisLite_0.3.0          
[127] askpass_1.1