HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

Problems changing from foreach to doFuture #12

Closed pat-s closed 7 years ago

pat-s commented 7 years ago

Hi Henrik,

I am about to replace my foreach implementation for doFuture. I want to use it as the default mode for sequential and parallel execution.

In general it should be quite easy, only replacing the doParallel lines

cl <- makeCluster(par.args$par.units, outfile = out.progress)
registerDoParallel(cl)

with

registerDoFuture()
plan(whatever)

However, I'm facing several problems and would need your help here again.

remotes:install_github("pat-s/sperrorest@doFuture")

Parallel modes

multisession

When using parallel mode multisession I am missing the console output (should there be one at all?)

nspres <- sperrorest(data = ecuador, formula = fo,
                     model.fun = glm, model.args = list(family = "binomial"),
                     pred.fun = predict, pred.args = list(type = "response"),
                     smp.fun = partition.cv,
                     smp.args = list(repetition = 1:6, nfold = 3),
                     par.args = list(par.mode = "foreach"),
                     benchmark = TRUE)

Using 'foreach()' parallel mode with 4 cores.

multiprocess

The following example throws an error which I assume is caused forking again (macOS)? It also doesn't work on Linux (Ubuntu 16.04). sequential and multisession work here.

lda.predfun <- function(object, newdata, fac = NULL) {
              library(nnet)
              majority <- function(x) {
                levels(x)[which.is.max(table(x))]
              }

              majority.filter <- function(x, fac) {
                for (lev in levels(fac)) {
                  x[fac == lev] <- majority(x[fac == lev])
                }
                x
              }

              pred <- predict(object, newdata = newdata)$class
              if (!is.null(fac)) pred <- majority.filter(pred, newdata[, fac])
              return(pred)
            }

            fo <- croptype ~ b12 + b13 + b14 + b15 + b16 + b17 + b22 + b23 + b24 +
              b25 + b26 + b27 + b32 + b33 + b34 + b35 + b36 + b37 + b42 +
              b43 + b44 + b45 + b46 + b47 + b52 + b53 + b54 + b55 + b56 +
              b57 + b62 + b63 + b64 + b65 + b66 + b67 + b72 + b73 + b74 +
              b75 + b76 + b77 + b82 + b83 + b84 + b85 + b86 + b87 + ndvi01 +
              ndvi02 + ndvi03 + ndvi04 + ndvi05 + ndvi06 + ndvi07 + ndvi08 +
              ndwi01 + ndwi02 + ndwi03 + ndwi04 + ndwi05 + ndwi06 + ndwi07 +
              ndwi08

            data(maipo)

            # err.rep = TRUE, err.fold = TRUE
            out <- sperrorest(fo, data = maipo, coords = c("utmx","utmy"),
                                 model.fun = lda,
                                 pred.fun = lda.predfun,
                                 smp.fun = partition.cv,
                                 par.args = list(par.mode = "foreach"),
                                 smp.args = list(repetition = 1:10, nfold = 4),
                                 error.rep = TRUE, error.fold = TRUE,
                                 benchmark = TRUE, progress = T)

Error: length(results) == nbr_of_elements is not TRUE
pat-s commented 7 years ago

Error: length(results) == nbr_of_elements is not TRUE

only applies to plan(multiprocess) i.e. multicore -> macOS/Linux. multisession works fine.

pat-s commented 7 years ago

It seems that console output of the workers is only provided using plan(cluster) with the following setup

cl <- makeCluster(availableCores(), outfile = out.progress)
plan(cluster, workers = cl)

where out.progress has to be adjusted by OS (?makeCluster()).

multiprocess and multisession both do not support console output of the workers. Is this correct?

HenrikBengtsson commented 7 years ago

On May 24, 2017 03:04, "Patrick Schratz" notifications@github.com wrote:

It seems that console output of the workers is only provided using plan(cluster) with the following setup

cl <- makeCluster(availableCores(), outfile = out.progress) plan(cluster, workers = cl)

where out.progress has to be adjusted by OS (?makeCluster()).

multiprocess and multisession both do not support console output of the workers. Is this correct?

Yes, this is correct. The approach above using outfile will output to the console, but note that it is not outputted via stdout/stderr of your main process. This means that they cannot be captured, sinked, etc. The user might not want that. It may also not be displayed in various R GUIs.

I've got an open issue in the https://github.com/HenrikBengtsson/future repository that discusses a wish of a generic standard way of capturing stdout/stderr from futures. I think (offline on a flight right now) there's also an issue about progress reporting/bars from active futures. Both these feature requests share the challenging question on how and when to communicate such information. I don't know of a simple solution for this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/HenrikBengtsson/doFuture/issues/12#issuecomment-303677885, or mute the thread https://github.com/notifications/unsubscribe-auth/ABir0iAByjCjZpmJyFYAhl6lCHNo0lXlks5r9ABsgaJpZM4Ni6ar .

pat-s commented 7 years ago

Okay, thanks for this summary! Curious about further development of future :bowtie: