Closed tedholzman closed 3 years ago
Hi Ted,
Sorry to hear that, hopefully we can figure this out. Can you give a few more items of info please:
If you use snowSuperLearner rather than CV.SuperLearner does the analysis complete?
Thanks, Chris
@ck37 This looks vaguely familiar to me. When you try to use parLapply(cl = NULL, X, fun, ...)
on a function that contains an argument named X
, it gets confused because X
matches two arguments. However, that's why the .crossValFun()
wrapper function uses an argument called dataX
instead of X
, so I am confused as to why it's hitting this error.
@tedholzman The second error you're getting, "cannot allocate 4G vector", means you've hit a memory limitation. You may be able to get around that by using fewer CV folds. However, if your data is just too big, you could try using the subsemble package (allowing you to keep your same base learner library), the h2oEnsemble package or the h2o package. These alternatives I listed do not yet have a built-in outer cross-validation function (a la CV.SuperLearner()
), so you'd also have to write some extra code for that.
Hi.
I am trying to do a number of analyses with snowSuperLearner and CV.SuperLearner. CV.SuperLearner fails with this error:
Error in clusterApply(cl, x = splitList(X, length(cl)), fun = lapply, : formal argument "x" matched by multiple actual arguments Calls: system.time ... CV.SuperLearner -> parLapply -> do.call -> clusterApply
The offending CV.SuperLearner call looks like this:
system.time(sl_cv_fit <- CV.SuperLearner(Y = Y, X = X, SL.library = SL.library, verbose = TRUE, method = "method.NNLS", cvControl=list(V=10), parallel=cl,control = list(saveFitLibrary = TRUE)))
cl is a FORK type cluster with 10 nodes.
The statement within CV.SuperLearner that fails appears to be'
cvList <- parLapply(parallel, x = folds, fun = .crossValFun, Y = Y, dataX = X, family = family, SL.library = SL.library, method = method, id = id, obsWeights = obsWeights, verbose = verbose, control = control, cvControl = cvControl, saveAll = saveAll)
It is being run on 64 bit computer a very large memory capacity. The R version is 3.3.3.
Oh. This error occurs as soon as control hist that parLapply call. On a different computer, with less memory (same R version) it fails with a "cannot allocate 4G vector" error -- after about 10 hours of computing.
Can you give me any advice?
Thanks. --Ted