Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.57k stars 974 forks source link

object not found error with foreach (cedta.override) #805

Open jrowen opened 9 years ago

jrowen commented 9 years ago

This is similar to #769 and https://github.com/rstudio/rmarkdown/issues/187. It's not clear to me if DEoptim needs to be added to data.table's whitelist or if there is a problem with how foreach loads data.table on each core.

The toy example below (partially borrowed from the DEoptim vignette) runs without issue when executed sequentially

library(data.table)
library(DEoptim)
library(doParallel)

obj = function(x, mkt) {
  #assignInNamespace("cedta.override", c(data.table:::cedta.override, "DEoptim"), "data.table")

  tmp = mkt[A > 0.5, sum(B)]

  10*length(x)+sum(x^2-10*cos(2*pi*x))
}

set.seed(1234)
DT_opt_ = data.table(A=rnorm(100), B=rnorm(100))

#registerDoParallel()
outDEoptim = DEoptim(obj, upper = 1, lower = 0,
                     control = DEoptim.control(
                       parallelType = 2,
                       foreachArgs = list(
                         .packages = c("data.table"))
                     ),
                     mkt = DT_opt_)

but when run in parallel using foreach on a Windows machine (uncomment registerDoParallel in the example), the following error is generated

Error in { : task 1 failed - "object 'B' not found"

The error can be eliminated by adding an assignInNamespace call within the obj function.

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
[1] doParallel_1.0.8 iterators_1.0.7  foreach_1.4.2   
[4] DEoptim_2.2-2    data.table_1.9.2

loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.1.1  plyr_1.8.1     
[4] Rcpp_0.11.2     reshape2_1.4    stringr_0.6.2  
[7] tools_3.1.1 
arunsrinivasan commented 9 years ago

I'm not able to reproduce this issue on 1.9.3. Could you please verify? Instructions for installation are on the README.md file. Thank you.

jrowen commented 9 years ago

I'm still seeing the same result with 1.9.3..

Error in { : task 1 failed - "object 'B' not found"
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
[1] doParallel_1.0.8 iterators_1.0.7  foreach_1.4.2   
[4] DEoptim_2.2-2    data.table_1.9.3 devtools_1.5    

loaded via a namespace (and not attached):
 [1] codetools_0.2-8 compiler_3.1.1  digest_0.6.4   
 [4] evaluate_0.5.5  httr_0.5        memoise_0.2.1  
 [7] plyr_1.8.1      Rcpp_0.11.2     RCurl_1.95-4.3 
[10] reshape2_1.4    stringr_0.6.2   tools_3.1.1    
[13] whisker_0.3-2 
logufu commented 9 years ago

Em Quinta-feira, 18 de Setembro de 2014 9:00, jrowen notifications@github.com escreveu:

I'm still seeing the same result with 1.9.3.. Error in { : task 1 failed - "object 'B' not found"

sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils
[6] datasets methods base other attached packages: [1] doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2
[4] DEoptim_2.2-2 data.table_1.9.3 devtools_1.5 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_3.1.1 digest_0.6.4 [4] evaluate_0.5.5 httr_0.5 memoise_0.2.1 [7] plyr_1.8.1 Rcpp_0.11.2 RCurl_1.95-4.3 [10] reshape2_1.4 stringr_0.6.2 tools_3.1.1
[13] whisker_0.3-2 — Reply to this email directly or view it on GitHub.

jrowen commented 8 years ago

I'm running into this issue again with 1.9.6.

library(DEoptim)
library(data.table)
library(doParallel)

registerDoParallel(cores = detectCores() * 0.75)

genrose.f <- function(x) {
  # assignInNamespace("cedta.override",
  #                   c(data.table:::cedta.override,"DEoptim"),
  #                   "data.table")

  DT = data.table(A = 1:10)
  DT[, `:=`(B = 1:10)]

  n <- length(x)
  fval <- 1.0 + sum (100 * (x[1:(n-1)]^2 - x[2:n])^2 + (x[2:n] - 1)^2)
  return(fval)
}
ctrl = DEoptim.control(
  parallelType = 2, 
  foreachArgs = list(
    .packages = c("data.table")
  ), itermax=100)

DEoptim(genrose.f, rep(-5, 10), rep(5, 10), control=ctrl)

Error message

Error in apply(i, 1, fn, ...) : 
  task 1 failed - "Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=")."

Uncomment the assignInNamespace call and it works fine. The error only appears when I run using foreach as the parallel engine (parallelType = 2).

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3     data.table_1.9.6  DEoptim_2.2-3    

loaded via a namespace (and not attached):
[1] drat_0.1.0       compiler_3.2.2   tools_3.2.2      codetools_0.2-14 chron_2.3-47 
jrowen commented 8 years ago

I can also confirm that this issue only arises on Windows when a parallel backend is used with foreach (parallelType = 2). If I use the control settings below, which relies on the parallel library, there is no error.

ctrl = DEoptim.control(
  parallelType = 1, 
  packages = c("data.table"),
  itermax=100)
arunsrinivasan commented 8 years ago

@jrowen thanks for the updates. Will be useful when we work on this.

MichaelChirico commented 5 years ago

Anyone still able to reproduce? Are we sure it's on data.table side? cc @Hong-Revo

iago-pssjd commented 1 year ago

I have a similar issue on Linux... (not a minimal example yet)