Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

BatchtoolsParam fails to propagate errors in bpiterate #257

Open DarwinAwardWinner opened 1 year ago

DarwinAwardWinner commented 1 year ago

I've discovered a case where BatchtoolsParam behaves differently from other backends. Consider the following reprex:

library(BiocParallel)
library(iterators)
library(assertthat)
library(testthat)
## Convert a foreach iterator into a BiocParallel iterator
makeIter <- function(it) {
  f <- function() {
    tryCatch(it$nextElem(), error = function(e) {
      if (e$message == "StopIteration") {
        NULL
      } else {
        stop(e)
      }
    })
  }
  f
}

## Example non-error usage of makeIter
bpiterate(
  makeIter(icount(10)),
  sqrt,
  BPPARAM = SerialParam()
)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1.414214
#> 
#> [[3]]
#> [1] 1.732051
#> 
#> [[4]]
#> [1] 2
#> 
#> [[5]]
#> [1] 2.236068
#> 
#> [[6]]
#> [1] 2.44949
#> 
#> [[7]]
#> [1] 2.645751
#> 
#> [[8]]
#> [1] 2.828427
#> 
#> [[9]]
#> [1] 3
#> 
#> [[10]]
#> [1] 3.162278

## Correctly throws the error
expect_error(bpiterate(
  makeIter(iterators::icount(10)),
  function(x) stop("This function always throws an error"),
  BPPARAM = SerialParam()
))

## Correctly throws the error
expect_error(bpiterate(
  makeIter(iterators::icount(10)),
  function(x) stop("This function always throws an error"),
  BPPARAM = MulticoreParam(workers = 2)
))

## Correctly throws the error
expect_error(bpiterate(
  makeIter(iterators::icount(10)),
  function(x) stop("This function always throws an error"),
  BPPARAM = SnowParam(workers = 2)
))

## Does not throw the error, but collects in attr(,"errors")
resList <- bpiterate(
  makeIter(iterators::icount(10)),
  function(x) stop("This function always throws an error"),
  BPPARAM = BatchtoolsParam(cluster = "socket", workers = 2)
)
#> Submitting 10 jobs in 2 chunks using cluster functions 'Socket' ...

print(resList)
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL
#> 
#> [[5]]
#> NULL
#> 
#> [[6]]
#> NULL
#> 
#> [[7]]
#> NULL
#> 
#> [[8]]
#> NULL
#> 
#> [[9]]
#> NULL
#> 
#> [[10]]
#> NULL
#> 
#> attr(,"errors")
#> attr(,"errors")$`10`
#> <unevaluated_error: not evaluated due to previous error>
#> 
#> attr(,"errors")$`1`
#> <remote_error in FUN(...): This function always throws an error>
#> traceback() available as 'attr(x, "traceback")'
#> 
#> attr(,"errors")$`2`
#> <unevaluated_error: not evaluated due to previous error>
#> 
#> attr(,"errors")$`3`
#> <unevaluated_error: not evaluated due to previous error>
#> 
#> attr(,"errors")$`4`
#> <unevaluated_error: not evaluated due to previous error>
#> 
#> attr(,"errors")$`5`
#> <remote_error in FUN(...): This function always throws an error>
#> traceback() available as 'attr(x, "traceback")'
#> 
#> attr(,"errors")$`6`
#> <unevaluated_error: not evaluated due to previous error>
#> 
#> attr(,"errors")$`7`
#> <unevaluated_error: not evaluated due to previous error>
#> 
#> attr(,"errors")$`8`
#> <unevaluated_error: not evaluated due to previous error>
#> 
#> attr(,"errors")$`9`
#> <unevaluated_error: not evaluated due to previous error>

## This assertion fails
assert_that(!any(bpok(resList)))
#> Error: !any(bpok(resList)) is not TRUE
## This assertion passes
assert_that(!any(bpok(attr(resList, "errors"))))
#> [1] TRUE

Created on 2023-07-07 with reprex v2.0.2

With any other backend (SerialParam, MulticoreParam, SnowParam), the bpiterate call throws an error (verified here by calling expect_error). However, BatchtoolsParam does not throw an error and instead returns a list with all NULL elements and an attribute "errors" containing the errors thrown during iteration. Furthermore, bpok says this object is totally fine. I would expect BatchtoolsParam to behave the same as the other backends here.

DarwinAwardWinner commented 1 year ago

Session info:

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-apple-darwin22.4.0 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /usr/local/Cellar/openblas/0.3.23/lib/libopenblasp-r0.3.23.dylib 
LAPACK: /usr/local/Cellar/r/4.3.0_1/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] graphics  grDevices utils     datasets  stats     methods   base     

other attached packages:
 [1] testthat_3.1.8      assertthat_0.2.1    iterators_1.0.14   
 [4] BiocParallel_1.34.2 tidyr_1.3.0         future_1.32.0      
 [7] devtools_2.4.5      usethis_2.1.6       openxlsx_4.2.5.2   
[10] magrittr_2.0.3      dplyr_1.1.2         rex_1.2.1          
[13] glue_1.6.2          stringr_1.5.0       ggplot2_3.4.2      
[16] colorout_1.2-2     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0  R.utils_2.12.2    fastmap_1.1.1     promises_1.2.0.1 
 [5] reprex_2.0.2      digest_0.6.32     base64url_1.4     mime_0.12        
 [9] lifecycle_1.0.3   ellipsis_0.3.2    processx_3.8.1    compiler_4.3.0   
[13] rlang_1.1.1       progress_1.2.2    tools_4.3.0       yaml_2.3.7       
[17] utf8_1.2.3        data.table_1.14.8 knitr_1.43        prettyunits_1.1.1
[21] brew_1.0-8        htmlwidgets_1.6.2 pkgbuild_1.4.2    batchtools_0.9.17
[25] pkgload_1.3.2     R.cache_0.16.0    miniUI_0.1.1.1    withr_2.5.0      
[29] purrr_1.0.1       R.oo_1.25.0       desc_1.4.2        grid_4.3.0       
[33] fansi_1.0.4       urlchecker_1.0.1  profvis_0.3.8     xtable_1.8-4     
[37] colorspace_2.1-0  globals_0.16.2    scales_1.2.1      cli_3.6.1        
[41] rmarkdown_2.21    crayon_1.5.2      generics_0.1.3    remotes_2.4.2    
[45] rstudioapi_0.14   sessioninfo_1.2.2 cachem_1.0.8      parallel_4.3.0   
[49] vctrs_0.6.2       callr_3.7.3       hms_1.1.3         listenv_0.9.0    
[53] clipr_0.8.0       snow_0.4-4        parallelly_1.36.0 codetools_0.2-19 
[57] ps_1.7.5          stringi_1.7.12    gtable_0.3.3      later_1.3.1      
[61] munsell_0.5.0     tibble_3.2.1      styler_1.10.1     pillar_1.9.0     
[65] rappdirs_0.3.3    htmltools_0.5.5   brio_1.1.3        R6_2.5.1         
[69] rprojroot_2.0.3   shiny_1.7.4       evaluate_0.21     R.methodsS3_1.8.2
[73] backports_1.4.1   memoise_2.0.1     httpuv_1.6.10     Rcpp_1.0.10      
[77] zip_2.3.0         checkmate_2.2.0   xfun_0.39         fs_1.6.2         
[81] pkgconfig_2.0.3  

Session info for the compute cluster, in which the same error is observed:

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /hpc/packages/minerva-centos7/intel/parallel_studio_xe_2019/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] testthat_3.1.6              qs_0.25.4                   future.apply_1.10.0         future.batchtools_0.11.0    parallelly_1.34.0           batchtools_0.9.15
 [7] future_1.30.0               goseq_1.48.0                geneLenDataBase_1.32.0      BiasedUrn_2.0.8             iterators_1.0.14            retry_0.1.0
[13] ggrepel_0.9.2               topGO_2.48.0                GO.db_3.15.0                EnrichmentBrowser_2.26.0    graph_1.74.0                AnnotationHub_3.4.0
[19] BiocFileCache_2.4.0         dbplyr_2.3.0                cachem_1.0.6                memoise_2.0.1               scales_1.2.1                SummarizedExperiment_1.26.1
[25] MatrixGenerics_1.8.1        matrixStats_0.63.0          rctutils_0.1.0              variancePartition_1.26.0    BiocParallel_1.30.3         lme4_1.1-31
[31] Matrix_1.5-3                edgeR_3.38.1                limma_3.52.4                ensembldb_2.20.2            AnnotationFilter_1.20.0     GenomicFeatures_1.48.3
[37] AnnotationDbi_1.58.0        Biobase_2.56.0              GenomicRanges_1.48.0        GenomeInfoDb_1.32.2         IRanges_2.30.0              S4Vectors_0.34.0
[43] BiocGenerics_0.42.0         withr_2.5.0                 assertthat_0.2.1            rex_1.2.1                   fs_1.6.0                    magrittr_2.0.3
[49] forcats_0.5.2               stringr_1.5.0               dplyr_1.1.2                 purrr_1.0.1                 tidyr_1.3.0                 tibble_3.2.1
[55] tidyverse_1.3.2             glmmLasso_1.6.2             broom_1.0.3                 rms_6.4-1                   SparseM_1.81                Hmisc_4.7-2
[61] ggplot2_3.4.0               Formula_1.2-4               survival_3.5-0              lattice_0.20-45             openxlsx_4.2.5.1            lubridate_1.9.1
[67] readr_2.1.3                 rlang_1.1.1                 colorout_1.2-2

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3                rtracklayer_1.56.1            bit64_4.0.5                   knitr_1.42                    multcomp_1.4-20
  [6] DelayedArray_0.22.0           data.table_1.14.6             rpart_4.1.19                  KEGGREST_1.36.2               RCurl_1.98-1.10
 [11] doParallel_1.0.17             generics_0.1.3                snow_0.4-4                    RhpcBLASctl_0.21-247.1        TH.data_1.1-1
 [16] RSQLite_2.2.20                RApiSerialize_0.1.2           polspline_1.1.22              bit_4.0.5                     tzdb_0.3.0
 [21] base64url_1.4                 xml2_1.3.3                    httpuv_1.6.8                  gargle_1.2.1                  xfun_0.36
 [26] hms_1.1.2                     promises_1.2.0.1              fansi_1.0.4                   restfulr_0.0.15               progress_1.2.2
 [31] caTools_1.18.2                readxl_1.4.1                  Rgraphviz_2.40.0              DBI_1.1.3                     htmlwidgets_1.6.1
 [36] googledrive_2.0.0             ellipsis_0.3.2                backports_1.4.1               annotate_1.74.0               aod_1.3.2
 [41] RcppParallel_5.1.6            biomaRt_2.52.0                deldir_1.0-6                  vctrs_0.6.2                   quantreg_5.94
 [46] vroom_1.6.1                   checkmate_2.1.0               GenomicAlignments_1.32.0      prettyunits_1.1.1             cluster_2.1.4
 [51] lazyeval_0.2.2                crayon_1.5.2                  labeling_0.4.2                pkgconfig_2.0.3               nlme_3.1-161
 [56] ProtGenerics_1.28.0           nnet_7.3-18                   globals_0.16.2                lifecycle_1.0.3               MatrixModels_0.5-1
 [61] sandwich_3.0-2                filelock_1.0.2                modelr_0.1.10                 cellranger_1.1.0              boot_1.3-28.1
 [66] zoo_1.8-11                    reprex_2.0.2                  base64enc_0.1-3               googlesheets4_1.0.1           stringfish_0.15.7
 [71] png_0.1-8                     rjson_0.2.21                  bitops_1.0-7                  debugme_1.1.0                 KernSmooth_2.23-20
 [76] Biostrings_2.64.0             blob_1.2.3                    brew_1.0-8                    jpeg_0.1-10                   GSEABase_1.58.0
 [81] plyr_1.8.8                    gplots_3.1.3                  zlibbioc_1.42.0               compiler_4.2.0                BiocIO_1.6.0
 [86] RColorBrewer_1.1-3            KEGGgraph_1.56.0              Rsamtools_2.12.0              cli_3.6.0                     XVector_0.36.0
 [91] listenv_0.9.0                 htmlTable_2.4.1               MASS_7.3-58.2                 mgcv_1.8-41                   tidyselect_1.2.0
 [96] stringi_1.7.12                yaml_2.3.7                    locfit_1.5-9.7                latticeExtra_0.6-30           grid_4.2.0
[101] tools_4.2.0                   timechange_0.2.0              rstudioapi_0.14               foreach_1.5.2                 foreign_0.8-84
[106] gridExtra_2.3                 farver_2.1.1                  digest_0.6.31                 BiocManager_1.30.19           shiny_1.7.4
[111] Rcpp_1.0.10                   BiocVersion_3.15.2            later_1.3.0                   httr_1.4.4                    Rdpack_2.4
[116] colorspace_2.1-0              brio_1.1.3                    rvest_1.0.3                   XML_3.99-0.10                 xtable_1.8-4
[121] jsonlite_1.8.4                nloptr_2.0.3                  R6_2.5.1                      pillar_1.9.0                  htmltools_0.5.4
[126] mime_0.12                     glue_1.6.2                    fastmap_1.1.0                 minqa_1.2.5                   interactiveDisplayBase_1.34.0
[131] codetools_0.2-18              mvtnorm_1.1-3                 utf8_1.2.3                    pbkrtest_0.5.2                curl_5.0.0
[136] gtools_3.9.4                  zip_2.2.2                     interp_1.1-3                  munsell_0.5.0                 GenomeInfoDbData_1.2.8
[141] haven_2.5.1                   reshape2_1.4.4                gtable_0.3.1                  rbibutils_2.2.13
DarwinAwardWinner commented 1 year ago

Note: I originally discovered this issue in the context of the use of bpiterate in variancePartition:::.fitVarPartModel (version 1.26.0).