futureverse / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
211 stars 16 forks source link

Not exporting variable that is assigned again in apply function #47

Closed renkun-ken closed 5 years ago

renkun-ken commented 5 years ago

Consider the following cases:

library(future)
plan(multisession, workers = 3)

a <- 1
b <- 2

# case 1
res <- future.apply::future_lapply(1:3, function(i) {
  if (TRUE) {
    a <- a + 1
  }
  a
})

# case 2
res <- future.apply::future_lapply(1:3, function(i) {
  if (FALSE) {
    a <- a + 1
  }
  a
})

# case3
res <- future.apply::future_lapply(1:3, function(i) {
  if (a < b) {
    a <- a + 1
  }
  a
})

# case4
res <- future.apply::future_lapply(1:3, function(i) {
  if (a > b) {
    a <- a + 1
  }
  a
})

Case2-4 will end up in the following error:

Error in ...future.FUN(...future.X_jj, ...) : object 'a' not found

It seems that if a global variable that is assigned again in a FALSE or non-determined condition, the global variable will not be exported to the worker.

But the behavior is inconsistent with how future determines which global variables to export to workers.

task1 <- future({
  if (TRUE) {
    a <- a + 1
  }
  a
})

task2 <- future({
  if (FALSE) {
    a <- a + 1
  }
  a
})

task3 <- future({
  if (a < b) {
    a <- a + 1
  }
  a
})

task4 <- future({
  if (a > b) {
    a <- a + 1
  }
  a
})

res <- values(list(task1, task2, task3, task4))

which produces the correct results without such error.

HenrikBengtsson commented 5 years ago

Thank you for reporting. Here is a smaller illustration of what I think you're reporting on and that clarifies that a should really be a global variable:

library(future)
plan(cluster, workers = "localhost")

a <- 1

y1 <- future.apply::future_lapply(1, function(i) {
  if (TRUE) a <- a + 1
  a
})

y2 <- future.apply::future_lapply(1, function(i) {
  if (FALSE) a <- a + 1
  a
})

Using options(future.globals=TRUE), we can see that a is identified as a global in the y1 case whereas it is not in the y2 case.

And, yes, in contrast, using bare-bone futures, we see that a is identified as a global in both cases;

f1 <- future({
  if (TRUE) a <- a + 1
  a
})
v1 <- value(f1)

f2 <- future({
  if (FALSE) a <- a + 1
  a
})
v2 <- value(f2)

Now, what's odd is that:

y3 <- future.apply::future_lapply(1, function(i) {
  b <- a
  a <- a + 1
  TRUE
})

fail to identify a as a global, whereas:

y4 <- future.apply::future_lapply(1, function(i) {
  b <- a
  TRUE
})

works.

It looks related to using a <- a + 1 where RHS is a global variable whereas LHS is a local variable(*). I'll investigate. (There's something way back in my head that this is on a todo-list from before, but I don't trust my memory anymore). I'll flag it as a bug for now.

(*) It's highly recommend not to use such ambiguous constructs in parallel processing. This is related to the "reset" example show in https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html.

renkun-ken commented 5 years ago

Thanks for pointing to the vignettes. I'll avoid such usage at the moment.

HenrikBengtsson commented 5 years ago

UPDATE: This has been fixed in the develop version of the globals package.

HenrikBengtsson commented 5 years ago

I've now also added package tests for future.apply that will test for this when globals (> 0.12.4) is released. I'm closing since there's nothing else to do in the future.apply package.

renkun-ken commented 5 years ago

Thanks! I'll test it soon.

geryan commented 4 years ago

A note for future reference that I've been getting the same error Error in ...future.FUN(...future.X_jj, ...) : object 'result' not found occurring under a future_lapply call to a function that internally creates (assigns) and then returns an object called result.

The problem can be fixed by changing the object name inside the function to something other than result.

I can't create a simple reproducible example, but I'm guessing the problem might be occurring because of a conflict with future's use of result.

HenrikBengtsson commented 4 years ago

Thxs @geryan. Reproducible examples are always useful, so please share. BTW, does

remotes::install_github("HenrikBengtsson/globals@develop")

fix your problem?