Closed HenrikBengtsson closed 2 years ago
This is odd...I'm having trouble making a stand alone example of the bug.
One thing that I do think is a bug in globals is that if a global is NULL
, then a where
environment gets attached to it:
fn <- function() {
x <- list(null = NULL, one = 1)
globals::as.Globals(x)
}
fn()
#> $null
#> NULL
#>
#> $one
#> [1] 1
#>
#> attr(,"where")
#> attr(,"where")$null
#> <environment: 0x7fa5ad858e70>
#>
#> attr(,"where")$one
#> <environment: R_EmptyEnv>
#>
#> attr(,"class")
#> [1] "Globals" "list"
Created on 2020-10-09 by the reprex package (v0.3.0.9001)
This is because as.Globals.list()
calls environment(obj)
but if obj
is NULL
then it just returns the current environment. I think for globals you'd want to catch if obj
is NULL
and say that the "where" environment is the empty environment
> globals:::as.Globals.list
function(x, ...) {
## Use the globals environments as the locals?
## (with emptyenv() as the fallback)
where <- attr(x, "where", exact = TRUE)
if (is.null(where)) {
where <- lapply(x, FUN = function(obj) {
e <- environment(obj)
if (is.null(e)) e <- emptyenv()
e
})
names(where) <- names(x)
attr(x, "where") <- where
}
Globals(x, ...)
}
Another place where this problem could happen is in c.Globals
https://github.com/HenrikBengtsson/globals/blob/c9a514701b50c71f7328424664ff8affb69c3288/R/Globals-class.R#L148-L149
And maybe here https://github.com/HenrikBengtsson/globals/blob/d8a08aba1c3aa0aae49f245cce06f68679aa0d11/R/packagesOf.R#L17
I don't think ...furrr_map_fn
is the problem. I think it has something to do with ‘...furrr_chunk_args’, ‘...furrr_chunk_seeds’, ‘...furrr_globals_max_size’
. These are three globals that i insert as placeholders up front
When these are converted to Globals objects, that where
environment gets attached to them and that is what seems to have a externalptr in it, although i have no idea where it comes from. I can't see it in the environment, but when I run the environment through assert_no_references()
I get the error
As I wrote between the comments on stackoverflow, the problem may not be related to furrr
since future.apply
shows the same issue:
future::plan("multisession", workers = 2L)
options(future.globals.onReference = "error")
y <- future.apply::future_lapply(1:2, identity)
## Error: Detected a non-exportable reference ('externalptr') in one
## of the globals (<unknown>) used in the future expression
Oddly, when I step through the code, it only errors on the second future of future_map(1:2, identity)
I believe that because of the environment bug listed above, somehow serialize()
is getting access to the environment where the list of futures that furrr / future.apply creates lives, and it is actually the future itself that has an external pointer in it 🤔
library(future)
plan(multisession, workers = 2)
options(future.globals.onReference = "error")
x <- future(1)
future:::assert_no_references(x)
#> Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression
Created on 2020-10-09 by the reprex package (v0.3.0.9001)
Ah, its the socket node worker objects!
future:::assert_no_references(x$workers[[1]]$con)
#> Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression
This explains why it only happens on the second future as well. When assert_no_references()
is run the first time, there aren't any futures created yet so it never hits that connection object. The second time, it somehow sees the future created from the first iteration and that has the connection object in it.
This definitely feels like a future / globals bug
This is currently the easiest way i can come up with to reproduce this weirdness
library(future)
options(future.globals.onReference = "error")
# use multisession because these futures have connection objects in them
plan(multisession, workers = 2)
fn <- function(globals) {
envs <- lapply(globals, function(x) {
# if x is `NULL`, this returns the current env!
env <- environment(x)
# emulate globals:::as.Globals.list
if (is.null(env)) {
env <- emptyenv()
}
env
})
future:::assert_no_references(envs)
print("all good - 1")
x <- future(1) # make a multisession future in the env
future:::assert_no_references(envs)
print("all good - 2")
}
globals <- list(x = 1, y = 1)
fn(globals)
#> [1] "all good - 1"
#> [1] "all good - 2"
globals <- list(x = NULL, y = NULL)
fn(globals)
#> [1] "all good - 1"
#> Error: Detected a non-exportable reference ('externalptr') in one of the globals ('x' of class 'environment') used in the future expression
Created on 2020-10-09 by the reprex package (v0.3.0.9001)
Thank you.
Just in case someone spots this here; as someone followed up on SO, this happens also with future.apply, e.g.
library(future)
plan(multisession, workers = 2)
options(future.globals.onReference = "error")
future.apply::future_lapply(1:2, identity)
## Error: Detected a non-exportable reference ('externalptr') in one
## of the globals (<unknown>) used in the future expression
Yes, it looks like it's picking up something from the surrounding environment. I'll investigate - this is non-critical but still interesting and it could reveal some inefficiency, e.g. exporting too much. I'll see if I can future:::assert_no_references()
to give some more information that <unknown>
in Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression
.
PS. Would you mind keeping this issue open and assigning this issue to be until I have had time to transfer it to the future issue tracker? You can also add a wont-fix
label.
Only out of curiosity, can this issue be transferred as-is to HenrikBengtsson/future here on GitHub? (You initiate, I accept - not sure if that works)
Sadly no:
You can only transfer issues between repositories owned by the same user or organization account.
I finally got some time to look into this. First, @DavisVaughan, if you don't mind, I'll keep using this existing furrr issue for the problem, because you've got some useful troubleshooting above. Second, your troubleshooting is spot on. Although I haven't gone all the way, I what's happening is that the reference is found via attribute where
in the list of the globals, and the where
attribute was either not there or broken in the past but then fixed causing this bug to pop up (I'll follow up later when I can track down from where this is). I also have an idea of the first, but I want to first track down why it worked before and not recently.
For now, I've updated future to give more details on the problem (https://github.com/HenrikBengtsson/future/commit/6f751fad1ab2e9c70c7ccc2f0c09d2217e9f964e), which confirms your troubleshoot above;
remotes::install_github("HenrikBengtsson/future", ref="6f751fa")
Then,
future::plan("multisession", workers = 2L)
options(future.globals.onReference = "error")
y <- future.apply::future_lapply(1:2, identity)
# Error: Detected a non-exportable reference (‘externalptr’) in one of the globals (‘...future.elements_ii’ of class ‘integer’) used in the future expression
As we see from the error message, this "doesn't make sense", but it's most likely because it's picked up via the where
attribute that is associated with the list element ...future.elements_ii
.
Fixed upstream in future by https://github.com/HenrikBengtsson/future/commit/09f9b7d2c0e3a475085f35c8c4eb36ecd465db55
+1
(From https://stackoverflow.com/questions/64258355/non-exportable-references-in-future-map)
Do you where this 'externalptr' might come from?
This is with both release and devel version of furrr.
From a quick debug, the following globals are exported:
Not sure, but I suspect it's
...furrr_map_fn
:Is that a copy of a non-exported function of purrr?