HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

Bug when using {emayili} with {doFuture} #75

Closed datawookie closed 1 year ago

datawookie commented 1 year ago

Describe the bug

Firstly I should point out that this is NOT a bug with {doFuture}. However, it is a bug reported on my {emayili} package that only manifests itself with {doFuture}. I have tried to debug this and got nowhere. I suspect that this is because I really don't understand how {doFuture} works. I was hoping that you would please help or give me some suggestions of where I might look.

The original bug is described here.

Reproduce example

The code below produces the error.

library(emayili)
library(doFuture)
library(parallel)
library(foreach)

numCores <- (detectCores() * 0.50)
registerDoFuture()
cl <- makeCluster(numCores)
plan(cluster, workers = cl)

foreach(i = 1:10) %dopar% {
  print(i)
}

stopCluster(cl)

Importantly note that no functions from {emayili} are being explicitly called. Simply importing the package is enough to break the script.

I have tried changing the order of importing {emayili} (first and last) and this does not seem to have any effect.

This is the error message:

Error in str_detect(addr, "[<>]") : 
  `string` must be a vector, not a function.

Here is the backtrace:

> traceback()
12: stop(condition)
11: signalConditions(obj, exclude = getOption("future.relay.immediate", 
        "immediateCondition"), resignal = resignal, ...)
10: signalConditionsASAP(obj, resignal = FALSE, pos = ii)
9: resolve.list(fs, result = TRUE, stdout = TRUE, signal = TRUE)
8: resolve(fs, result = TRUE, stdout = TRUE, signal = TRUE)
7: withCallingHandlers({
       resolve(fs, result = TRUE, stdout = TRUE, signal = TRUE)
   }, RngFutureCondition = function(cond) {
       idx <- NULL
       uuid <- attr(cond, "uuid")
       if (!is.null(uuid)) {
           for (kk in seq_along(fs)) {
               if (identical(fs[[kk]]$uuid, uuid)) 
                   idx <- kk
           }
       }
       else {
           f <- attr(cond, "future")
           if (is.null(f)) 
               return()
           if (!isFALSE(f$seed)) 
               return()
           for (kk in seq_along(fs)) {
               if (identical(fs[[kk]], f)) 
                   idx <- kk
    ...
6: e$fun(obj, substitute(ex), parent.frame(), e$data)
5: foreach(i = 1:10) %dopar% {
       print(i)
   } at issue.R#11
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("~/proj/438-emayili/issue.R", echo = TRUE)

I tracked the function call mentioned in the error message to a generic function in {emayili}:

as.address <- function(addr, validate = FALSE) {
  if ("address" %in% class(addr)) {
    addr
  } else {
    display <- ifelse(
      str_detect(addr, "[<>]"),
      str_extract(addr, ".*<") %>% str_remove("<"),
      NA
    )
    email <- ifelse(
      str_detect(addr, "[<>]"),
      str_extract(addr, "<.*>") %>% str_remove_all("[<>]"),
      addr
    )

    address(email, display, validate = validate)
  }
}

I seriously do not expect you to solve the problem. But if you could suggest where I should start looking or how I could debug this I would be most grateful!

Expected behavior

Simply including the package should not make any difference to the execution of the script.

Session information Please share your session information, e.g.

> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.04

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
 [1] LC_CTYPE=en_ZA.UTF-8       LC_NUMERIC=C               LC_TIME=en_ZA.UTF-8        LC_COLLATE=en_ZA.UTF-8     LC_MONETARY=en_ZA.UTF-8   
 [6] LC_MESSAGES=en_ZA.UTF-8    LC_PAPER=en_ZA.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] emayili_0.7.18 doFuture_1.0.0 future_1.32.0  foreach_1.5.2 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10         pillar_1.9.0        compiler_4.2.2      base64enc_0.1-3     iterators_1.0.14    tools_4.2.2        
 [7] digest_0.6.31       evaluate_0.20       lifecycle_1.0.3     tibble_3.2.1        pkgconfig_2.0.3     rlang_1.1.0        
[13] cli_3.6.1           rstudioapi_0.14     commonmark_1.8.1    curl_5.0.0          xfun_0.40           fastmap_1.1.1      
[19] knitr_1.42          dplyr_1.1.2         httr_1.4.5          stringr_1.5.0       xml2_1.3.3          generics_0.1.3     
[25] vctrs_0.6.1         globals_0.16.2      triebeard_0.4.1     tidyselect_1.2.0    glue_1.6.2          listenv_0.9.0      
[31] R6_2.5.1            future.apply_1.11.0 fansi_1.0.4         parallelly_1.35.0   rmarkdown_2.20      purrr_1.0.1        
[37] tidyr_1.3.0         logger_0.2.2        magrittr_2.0.3      urltools_1.7.3      codetools_0.2-19    htmltools_0.5.4    
[43] rvest_1.0.3         mime_0.12           utf8_1.2.3          stringi_1.7.12
HenrikBengtsson commented 1 year ago

Thank you; here's a minimal reproducible example:

library(emayili)
library(doFuture)
plan(cluster, workers = 1)
y <- foreach(x = NULL) %dofuture% NULL

I'll investigate.

datawookie commented 1 year ago

Thank you @HenrikBengtsson!

HenrikBengtsson commented 1 year ago

This has been fixed in future (>= 1.33.0-9003).

The problem was that a getExpression() method of the future package ended up calling emayili::local() instead of base::local() as intended whenever emayili was attached on the parallel worker. In turn, emayili::local() called as.address(), giving the error. It only happened for cluster/multisession futures.

HenrikBengtsson commented 1 year ago

Here's another example, without emayili, illustrating the problem:

> local <- function(...) stop("Wrong local()!")
> library(doFuture)
> plan(cluster, workers = 1)
> y <- foreach(x = NULL) %dofuture% NULL
Error in local({ : Wrong local()!
datawookie commented 1 year ago

Wow! Thanks @HenrikBengtsson for tracking down and fixing this so promptly. Really appreciate it. Best regards, Andrew.