SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
177 stars 19 forks source link

Infinite execution when multithreading is enabled #237

Closed ATpoint closed 1 year ago

ATpoint commented 1 year ago

MRE:

library(SingleR)
library(scuttle)

ref <- .mockRefData()
test <- .mockTestData(ref)

ref <- scuttle::logNormCounts(ref)
test <- scuttle::logNormCounts(test)

# Running the classification with different options:
pred <- SingleR(test, ref, labels=ref$label, num.threads = parallel::detectCores())

If you run this (with num.threads) on a machine with many cores, here 20, the process actually starts (based on top), and then after a few seconds gets down to 0% CPU usage and hangs there forever. This was reproducible both inside the Bioconductor (bioconductor/bioconductor_docker:RELEASE_3_16) Docker container, on Windows and on a GitPod runner, the former two via RStudio, the latter directly via R console).

-Alex

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] viridis_0.6.2               viridisLite_0.4.1           ggrepel_0.9.3               colorspace_2.1-0            UCell_2.2.0                 lubridate_1.9.2            
 [7] forcats_1.0.0               stringr_1.5.0               dplyr_1.1.0                 purrr_1.0.1                 readr_2.1.4                 tidyr_1.3.0                
[13] tibble_3.2.0                tidyverse_2.0.0             SingleR_2.0.0               scran_1.26.2                scDblFinder_1.12.0          scater_1.26.1              
[19] ggplot2_3.4.1               scuttle_1.8.4               RcppML_0.5.6                patchwork_1.1.2             org.Mm.eg.db_3.16.0         AnnotationDbi_1.60.2       
[25] Matrix_1.5-3                magrittr_2.0.3              magick_2.7.4                limma_3.54.2                DropletUtils_1.18.1         data.table_1.14.8          
[31] ComplexHeatmap_2.14.0       circlize_0.4.15             bluster_1.8.0               batchelor_1.14.1            SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
[37] Biobase_2.58.0              GenomicRanges_1.50.2        GenomeInfoDb_1.34.9         IRanges_2.32.0              S4Vectors_0.36.2            BiocGenerics_0.44.0        
[43] MatrixGenerics_1.10.0       matrixStats_0.63.0          BiocParallel_1.32.5        

loaded via a namespace (and not attached):
  [1] plyr_1.8.8                igraph_1.4.1              digest_0.6.31             htmltools_0.5.4           foreach_1.5.2             fansi_1.0.4               memoise_2.0.1            
  [8] ScaledMatrix_1.6.0        cluster_2.1.4             doParallel_1.0.17         openxlsx_4.2.5.2          tzdb_0.3.0                Biostrings_2.66.0         R.utils_2.12.2           
 [15] timechange_0.2.0          blob_1.2.3                xfun_0.37                 crayon_1.5.2              RCurl_1.98-1.10           jsonlite_1.8.4            iterators_1.0.14         
 [22] glue_1.6.2                gtable_0.3.1              zlibbioc_1.44.0           XVector_0.38.0            GetoptLong_1.0.5          DelayedArray_0.24.0       BiocSingular_1.14.0      
 [29] Rhdf5lib_1.20.0           shape_1.4.6               HDF5Array_1.26.0          scales_1.2.1              DBI_1.1.3                 edgeR_3.40.2              Rcpp_1.0.10              
 [36] clue_0.3-64               dqrng_0.3.0               bit_4.0.5                 rsvd_1.0.5                ResidualMatrix_1.8.0      metapod_1.6.0             httr_1.4.5               
 [43] RColorBrewer_1.1-3        ellipsis_0.3.2            pkgconfig_2.0.3           XML_3.99-0.13             R.methodsS3_1.8.2         locfit_1.5-9.7            utf8_1.2.3               
 [50] reshape2_1.4.4            tidyselect_1.2.0          rlang_1.0.6               munsell_0.5.0             tools_4.2.2               cachem_1.0.7              xgboost_1.7.3.1          
 [57] cli_3.6.0                 generics_0.1.3            RSQLite_2.3.0             evaluate_0.20             fastmap_1.1.1             yaml_2.3.7                knitr_1.42               
 [64] bit64_4.0.5               zip_2.2.2                 KEGGREST_1.38.0           sparseMatrixStats_1.10.0  R.oo_1.25.0               compiler_4.2.2            rstudioapi_0.14          
 [71] beeswarm_0.4.0            png_0.1-8                 statmod_1.5.0             stringi_1.7.12            lattice_0.20-45           vctrs_0.5.2               pillar_1.8.1             
 [78] lifecycle_1.0.3           rhdf5filters_1.10.0       GlobalOptions_0.1.2       BiocNeighbors_1.16.0      bitops_1.0-7              irlba_2.3.5.1             rtracklayer_1.58.0       
 [85] R6_2.5.1                  BiocIO_1.8.0              gridExtra_2.3             vipor_0.4.5               codetools_0.2-18          MASS_7.3-58.1             rhdf5_2.42.0             
 [92] rjson_0.2.21              withr_2.5.0               GenomicAlignments_1.34.1  Rsamtools_2.14.0          GenomeInfoDbData_1.2.9    hms_1.1.2                 beachmat_2.14.0          
 [99] rmarkdown_2.20            DelayedMatrixStats_1.20.0 ggbeeswarm_0.7.1          restfulr_0.0.15   
LTLA commented 1 year ago

For the time being, just leave one core free. Seems like a problem with deadlocks inside raticate, due to the need to protect the R interpreter on the main thread.

LTLA commented 1 year ago

Just pushed beachmat 2.15.1 to BioC-devel. Try updating that, then reinstall SingleR from source.

ATpoint commented 1 year ago

Thanks, seems to work now. Can I use the GitHub version of beachmat-2.15.1 safely with an existing Bioc-3.16 environment or does that break anything other than BiocManager::valid() complaining about "too new"?

LTLA commented 1 year ago

Just pushed 2.14.1 to BioC-release so you can get it from there.