HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
946 stars 82 forks source link

Issues when running on UNIX HPC environment #615

Open stephenashton-dhsc opened 2 years ago

stephenashton-dhsc commented 2 years ago

Describe the bug

I can run a future containing a custom method on a Windows laptop within RStudio without issue, but when I launch this onto a HPC node, it fails.

Reproduce example

methods::setGeneric(
  "my_custom_method",
  function(x) {
    standardGeneric("my_custom_method")
  }
)

methods::setMethod(
  "my_custom_method",
  methods::signature(x = "numeric"),
  function(x) {
    y <- x^2
    return(y)
  }
)

library(future)
future::plan("future::multicore")
value(future({my_custom_method(1)}))

Return:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘my_custom_method’ for signature ‘"numeric"’

Expected behaviour

[1] 1

Session information

Please share your session information after the error has occurred so that we also see which packages and versions are involved;

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/packages/R/4.1.1/lib64/R/lib/libRblas.so
LAPACK: /usr/local/packages/R/4.1.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.25.0

loaded via a namespace (and not attached):
[1] compiler_4.1.1    parallelly_1.31.1 tools_4.1.1       parallel_4.1.1   
[5] listenv_0.8.0     codetools_0.2-18  digest_0.6.29     globals_0.14.0   

> future::futureSessionInfo()
*** Package versions
future 1.25.0, parallelly 1.31.1, parallel 4.1.1, globals 0.14.0, listenv 0.8.0

*** Allocations
availableCores():
        system cgroups.cpuset          nproc          Slurm 
            96             96             96             96 
availableWorkers():
$Slurm
 [1] "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027"
 [6] "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027"
 …
[91] "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027"

$system
 [1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
 [6] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
 …
[91] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"

*** Settings
- future.plan=<not set>
- future.fork.multithreading.enable=<not set>
- future.globals.maxSize=<not set>
- future.globals.onReference=<not set>
- future.resolve.recursive=<not set>
- future.rng.onMisuse=<not set>
- future.wait.timeout=<not set>
- future.wait.interval=<not set>
- future.wait.alpha=<not set>
- future.startup.script=<not set>

*** Backends
Number of workers: 96
List of future strategies:
1. multicore:
   - args: function (..., workers = availableCores(constraints = "multicore"), envir = parent.frame())
   - tweaked: FALSE
   - call: future::plan("future::multicore")

*** Basic tests
   worker   pid     r sysname                     release
1       1 87160 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_64
2       2 87165 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_64
3       3 87171 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_64
…
96     96 87639 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_64
                               version                       nodename machine
1  #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_64
2  #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_64
3  #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_64
…
96 #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_64
     login                      user            effective_user
1  unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN
2  unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN
3  unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN
…
96 unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN
Number of unique PIDs: 96 (as expected)
stephenashton-dhsc commented 2 years ago

Please note MYDOMAIN is covering over the domain containing my user profile and the server location.

stephenashton-dhsc commented 2 years ago

Having done some further testing, I believe this is to do with the future::multicore strategy.

It also fails when using future::sequential but succeeds using future::multisession

I suspect this means that the error is actually present in future::sequential, and my session on the HPC is defaulting to this, rather than using the future::multicore strategy as intended (although future::supportsMulticore() returns TRUE on the HPC environment)

stephenashton-dhsc commented 2 years ago

I can also confirm that this behaviour is present on my Windows laptop - the code fails when using future::sequential, but succeeds using future::multisession. It also fails when using future::multicore, but I assume this is due to it defaulting the future::sequential where future::supportsMulticore() is FALSE.

HenrikBengtsson commented 2 years ago

Hi, I can reproduce this. I was surprised that it worked for multisession, but not sequential and multicore; normally it's the other way around. However, it turns out, this is most likely related to the recent https://github.com/HenrikBengtsson/future/issues/608 bug. The workaround is the same: set (hidden) option future.globals.keepWhere to TRUE as is:

library(future)
library(methods)
options(future.globals.keepWhere = TRUE)

setGeneric("my_custom_method", function(x) {
    standardGeneric("my_custom_method")
})

setMethod("my_custom_method", methods::signature(x = "numeric"), function(x) {
  x^2
})

plan(sequential)                   ## works with future.globals.keepWhere = TRUE
# plan(multicore, workers = 2L)    ## works with future.globals.keepWhere = TRUE
# plan(multisession, workers = 2L)

f <- future({ my_custom_method(2) })
v <- value(f)
print(v)
stopifnot(v == my_custom_method(2))