DavisVaughan / furrr

Apply Mapping Functions in Parallel using Futures
https://furrr.futureverse.org/
Other
698 stars 40 forks source link

future_map raise an error when .f contains a call to a python function imported via the reticulate package #241

Closed abdellah19jan closed 2 years ago

abdellah19jan commented 2 years ago

Below is the R code:

library(reticulate)
library(furrr)
library(purrr)

plan(multisession, workers = 7)

use_python(paste0("C:/Users/", Sys.getenv("USERNAME"), "/Anaconda3/python.exe"),
           required = TRUE
)

source_python("test.py")

.f <- function(x) py_f(x)

map(c(1, 2), .f)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
future_map(c(1, 2), .f)
#> Error in unserialize(node$con) : 
#>   MultisessionFuture (<none>) failed to receive results from cluster RichSOCKnode #1 (PID 6488 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. Detected a non-exportable reference (‘externalptr’) in one of the globals (‘py_f’ of class ‘python.builtin.function’) used in the future expression. The total size of the 11 globals exported is 44.38 KiB. The three largest globals are ‘py_f’ (17.96 KiB of class ‘function’), ‘py_resolve_dots’ (11.99 KiB of class ‘function’) and ‘...furrr_map_fn’ (6.61 KiB of class ‘function’)

And below is the python code of the test.py file:

def py_f(x): return x
DavisVaughan commented 2 years ago

Python functions are a type of non exportable objects that future (which underlies furrr) has no way to deal with. See this part of the error:

Detected a non-exportable reference (‘externalptr’) in one of the globals (‘py_f’ of class ‘python.builtin.function’) used in the future expression.

https://future.futureverse.org/articles/future-4-non-exportable-objects.html

There is even a specific section about reticulate https://future.futureverse.org/articles/future-4-non-exportable-objects.html#package-reticulate

Something like this might work, but in general I think you are probably going to have a tough time combining R parallelism with reticulate

library(furrr)

plan(multisession, workers = 7)

.f <- function(x) {
  library(reticulate)

  use_python(paste0("C:/Users/", Sys.getenv("USERNAME"), "/Anaconda3/python.exe"),
             required = TRUE
  )

  source_python("test.py")

  py_f(x)
}

future_map(c(1, 2), .f)