Closed GiuliaPais closed 1 year ago
Thanks for the report.
It might help to clarify what stop.on.error=
does. Suppose there are two workers, and 4 tasks numbered 1, 2, 3, 4. Each task is to compute foo()
. Each worker gets two tasks, 1 and 2 to the first worker, 3 and 4 to the second worker. If stop.on.error=TRUE
, the first worker tries to evaluate foo(1)
. This fails, and since stop.on.error=TRUE
, it does not try to evaluate task 2. Likewise, the second worker, in parallel with the first, tries to evaluate foo(3)
. This fails, and the second worker does not try to evaluate foo(4)
. This is reported as 2 remote errors, and 2 unevaluated errors
> bplapply(1:4, foo, BPPARAM = SnowParam(2, stop.on.error = TRUE))
Error: BiocParallel errors
2 remote errors, element index: 1, 3
2 unevaluated and other errors
first remote error:
Error in bar(num_vec): could not find function "bar"
Using stop.on.error = TRUE
is very appropriate in this situation, since bar()
will not magically be available to other tasks on the same worker.
Suppose stop.on.error = FALSE
. The first worker tries task 1 (foo(1)
). This fails, so it tries foo(2)
, which also fails. Likewise for the second worker, trying foo(3)
and then foo(4)
. We see 4 remote errors
> bplapply(1:4, foo, BPPARAM = SnowParam(2, stop.on.error = FALSE))
Error: BiocParallel errors
4 remote errors, element index: 1, 2, 3, 4
0 unevaluated and other errors
first remote error:
Error in bar(num_vec): could not find function "bar"
This might be appropriate if the error was somehow stochastic, e.g., a numerical method sometimes failed to converge, but it might make sense to continue trying other tasks...
In your code, you've set the number of workers to 4. There are only two tasks (A
and B
), so one worker gets task A
, the other task B
. Both error, there are no more tasks for either worker, and stop.on.error
makes no difference. This is what you report -- 2 remote errors regardless of the value of stop.on.error
.
You can see the expected behavior if you arrange for more tasks than workers, e.g., by adding two tasks to data_list
and reducing the number of workers to 2
> launch_par_function(2, stop_on_error = TRUE)
Error: BiocParallel errors
2 remote errors, element index: 1, 3
2 unevaluated and other errors
first remote error:
Error in bar(num_vec): could not find function "bar"
> launch_par_function(2, stop_on_error = FALSE)
Error: BiocParallel errors
4 remote errors, element index: 1, 2, 3, 4
0 unevaluated and other errors
first remote error:
Error in bar(num_vec): could not find function "bar"
stop.on.error = TRUE
doesn't stop after the very first error on any worker, because that would force sequential evaluation.
Thanks for the clarification. Then I guess to achieve my initial expected behaviour it is suitable an approach using purrr::safely and the handling of errors downstream if they arise. Thanks again
As per title, when initialising a new DoParam object with option
stop.on.error = FALSE
, the evaluation should not stop on error but it does. Here is a reprex that demonstrates itCreated on 2023-02-20 with reprex v2.0.2
Session info
``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.1 (2022-06-23) #> os macOS Big Sur ... 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Rome #> date 2023-02-20 #> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> BiocParallel 1.32.5 2022-12-23 [1] Bioconductor #> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0) #> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.2.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.1) #> doFuture 0.12.2 2022-04-26 [1] CRAN (R 4.2.0) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.0) #> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.0) #> future 1.31.0 2023-02-01 [1] CRAN (R 4.2.0) #> globals 0.16.2 2022-11-21 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.0) #> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.0) #> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.1) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) #> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> parallelly 1.34.0 2023-01-13 [1] CRAN (R 4.2.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0) #> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> styler 1.9.0 2023-01-15 [1] CRAN (R 4.2.0) #> vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```