DavisVaughan / furrr

Apply Mapping Functions in Parallel using Futures
https://furrr.futureverse.org/
Other
698 stars 39 forks source link

Q: non-exportable reference ('externalptr') #168

Closed HenrikBengtsson closed 2 years ago

HenrikBengtsson commented 4 years ago

(From https://stackoverflow.com/questions/64258355/non-exportable-references-in-future-map)

Do you where this 'externalptr' might come from?

$ R --vanilla
future::plan("multisession", workers = 2L)
options(future.globals.onReference = "error")
y <- furrr::future_map(1:2, identity)
## Error: Detected a non-exportable reference ('externalptr') in one
##  of the globals (<unknown>) used in the future expression

This is with both release and devel version of furrr.

From a quick debug, the following globals are exported:

[18:44:15.331] - globals passed as-is: [6] ‘...furrr_fn’, ‘...furrr_dots’, ‘...furrr_map_fn’, ‘...furrr_chunk_args’, ‘...furrr_chunk_seeds’, ‘...furrr_globals_max_size’

Not sure, but I suspect it's ...furrr_map_fn:

function (.x, .f, ...) 
{
    .f <- as_mapper(.f, ...)
    .Call(map_impl, environment(), ".x", ".f", "list")
}
<bytecode: 0x56247d4be4f0>
<environment: namespace:purrr>

Is that a copy of a non-exported function of purrr?

DavisVaughan commented 4 years ago

This is odd...I'm having trouble making a stand alone example of the bug.

One thing that I do think is a bug in globals is that if a global is NULL, then a where environment gets attached to it:

fn <- function() {
  x <- list(null = NULL, one = 1)
  globals::as.Globals(x)
}

fn()
#> $null
#> NULL
#> 
#> $one
#> [1] 1
#> 
#> attr(,"where")
#> attr(,"where")$null
#> <environment: 0x7fa5ad858e70>
#> 
#> attr(,"where")$one
#> <environment: R_EmptyEnv>
#> 
#> attr(,"class")
#> [1] "Globals" "list"

Created on 2020-10-09 by the reprex package (v0.3.0.9001)

This is because as.Globals.list() calls environment(obj) but if obj is NULL then it just returns the current environment. I think for globals you'd want to catch if obj is NULL and say that the "where" environment is the empty environment

> globals:::as.Globals.list
function(x, ...) {
  ## Use the globals environments as the locals?
  ## (with emptyenv() as the fallback)
  where <- attr(x, "where", exact = TRUE)
  if (is.null(where)) {
    where <- lapply(x, FUN = function(obj) {
        e <- environment(obj)
        if (is.null(e)) e <- emptyenv()
        e
    })
    names(where) <- names(x)
    attr(x, "where") <- where
  }

  Globals(x, ...)
}

Another place where this problem could happen is in c.Globals https://github.com/HenrikBengtsson/globals/blob/c9a514701b50c71f7328424664ff8affb69c3288/R/Globals-class.R#L148-L149

And maybe here https://github.com/HenrikBengtsson/globals/blob/d8a08aba1c3aa0aae49f245cce06f68679aa0d11/R/packagesOf.R#L17

DavisVaughan commented 4 years ago

I don't think ...furrr_map_fn is the problem. I think it has something to do with ‘...furrr_chunk_args’, ‘...furrr_chunk_seeds’, ‘...furrr_globals_max_size’. These are three globals that i insert as placeholders up front

https://github.com/DavisVaughan/furrr/blob/bf5e45c78b16bb72849a2a2c31940ebf1e76048f/R/globals.R#L25-L32

When these are converted to Globals objects, that where environment gets attached to them and that is what seems to have a externalptr in it, although i have no idea where it comes from. I can't see it in the environment, but when I run the environment through assert_no_references() I get the error

edo91 commented 4 years ago

As I wrote between the comments on stackoverflow, the problem may not be related to furrr since future.apply shows the same issue:

future::plan("multisession", workers = 2L)
options(future.globals.onReference = "error")
y <- future.apply::future_lapply(1:2, identity)
## Error: Detected a non-exportable reference ('externalptr') in one
##  of the globals (<unknown>) used in the future expression
DavisVaughan commented 4 years ago

Oddly, when I step through the code, it only errors on the second future of future_map(1:2, identity)

DavisVaughan commented 4 years ago

I believe that because of the environment bug listed above, somehow serialize() is getting access to the environment where the list of futures that furrr / future.apply creates lives, and it is actually the future itself that has an external pointer in it 🤔

library(future)

plan(multisession, workers = 2)
options(future.globals.onReference = "error")
x <- future(1)
future:::assert_no_references(x)
#> Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression

Created on 2020-10-09 by the reprex package (v0.3.0.9001)

Ah, its the socket node worker objects!

future:::assert_no_references(x$workers[[1]]$con)
#> Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression

This explains why it only happens on the second future as well. When assert_no_references() is run the first time, there aren't any futures created yet so it never hits that connection object. The second time, it somehow sees the future created from the first iteration and that has the connection object in it.

This definitely feels like a future / globals bug

DavisVaughan commented 4 years ago

This is currently the easiest way i can come up with to reproduce this weirdness

library(future)
options(future.globals.onReference = "error")

# use multisession because these futures have connection objects in them
plan(multisession, workers = 2)

fn <- function(globals) {
  envs <- lapply(globals, function(x) {
    # if x is `NULL`, this returns the current env!
    env <- environment(x) 

    # emulate globals:::as.Globals.list
    if (is.null(env)) {
      env <- emptyenv()
    }

    env
  })

  future:::assert_no_references(envs)
  print("all good - 1")

  x <- future(1) # make a multisession future in the env

  future:::assert_no_references(envs)
  print("all good - 2")
}

globals <- list(x = 1, y = 1)
fn(globals)
#> [1] "all good - 1"
#> [1] "all good - 2"

globals <- list(x = NULL, y = NULL)
fn(globals)
#> [1] "all good - 1"
#> Error: Detected a non-exportable reference ('externalptr') in one of the globals ('x' of class 'environment') used in the future expression

Created on 2020-10-09 by the reprex package (v0.3.0.9001)

HenrikBengtsson commented 4 years ago

Thank you.

Just in case someone spots this here; as someone followed up on SO, this happens also with future.apply, e.g.

library(future)
plan(multisession, workers = 2)
options(future.globals.onReference = "error")
future.apply::future_lapply(1:2, identity)
## Error: Detected a non-exportable reference ('externalptr') in one
## of the globals (<unknown>) used in the future expression

Yes, it looks like it's picking up something from the surrounding environment. I'll investigate - this is non-critical but still interesting and it could reveal some inefficiency, e.g. exporting too much. I'll see if I can future:::assert_no_references() to give some more information that <unknown> in Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression.

PS. Would you mind keeping this issue open and assigning this issue to be until I have had time to transfer it to the future issue tracker? You can also add a wont-fix label.

HenrikBengtsson commented 4 years ago

Only out of curiosity, can this issue be transferred as-is to HenrikBengtsson/future here on GitHub? (You initiate, I accept - not sure if that works)

DavisVaughan commented 4 years ago

Sadly no:

You can only transfer issues between repositories owned by the same user or organization account.

https://docs.github.com/en/free-pro-team@latest/github/managing-your-work-on-github/transferring-an-issue-to-another-repository

HenrikBengtsson commented 3 years ago

I finally got some time to look into this. First, @DavisVaughan, if you don't mind, I'll keep using this existing furrr issue for the problem, because you've got some useful troubleshooting above. Second, your troubleshooting is spot on. Although I haven't gone all the way, I what's happening is that the reference is found via attribute where in the list of the globals, and the where attribute was either not there or broken in the past but then fixed causing this bug to pop up (I'll follow up later when I can track down from where this is). I also have an idea of the first, but I want to first track down why it worked before and not recently.

For now, I've updated future to give more details on the problem (https://github.com/HenrikBengtsson/future/commit/6f751fad1ab2e9c70c7ccc2f0c09d2217e9f964e), which confirms your troubleshoot above;

remotes::install_github("HenrikBengtsson/future", ref="6f751fa")

Then,

future::plan("multisession", workers = 2L)
options(future.globals.onReference = "error")
y <- future.apply::future_lapply(1:2, identity)
# Error: Detected a non-exportable reference (‘externalptr’) in one of the globals (‘...future.elements_ii’ of class ‘integer’) used in the future expression

As we see from the error message, this "doesn't make sense", but it's most likely because it's picked up via the where attribute that is associated with the list element ...future.elements_ii.

DavisVaughan commented 2 years ago

Fixed upstream in future by https://github.com/HenrikBengtsson/future/commit/09f9b7d2c0e3a475085f35c8c4eb36ecd465db55

Steviey commented 1 year ago

+1