futureverse / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
954 stars 85 forks source link

Splitting/teeing messages/warnings/cat so they also appear in batchtools log files #424

Closed rubenarslan closed 4 years ago

rubenarslan commented 5 years ago

Re this SO post:

I'm using future to submit long-running jobs (brms fits) to a Torque cluster by nesting remote and qsub future. If I can retrieve the fit from the same pipe, that's great, but often the jobs are so long-running that pipes will break. Then, I lose all logging, including error messages.

Also, brms logs model fitting progress by echoing iterations via cat. I could use these to monitor job progress, but not if future captures them.

rubenarslan commented 5 years ago

Any update on this? Should I try my hand at a PR?

HenrikBengtsson commented 5 years ago

Hi, sorry for the apparent radio silence. I think the solution is to split stdout and that has to be done by the (internal) future expression. I've started https://github.com/HenrikBengtsson/future/issues/294. For conditions (e.g. messages and warnings) to be "split", they need to be outputted locally when captured and then later relayed upstream.

The above is a design issue and less of an implementation issue per se. We have to make sure to get this correct for all backends which is why it's slow progress. You mention a PR - did you have specific ideas on how this could be added/implemented?

A "workaround" until above is implemented is to specify stdout = NA when creating the futures. That will stop any capturing - this is what the future framework did in the past.

rubenarslan commented 5 years ago

Hey @HenrikBengtsson. Ah great, I didn't see the other issue. No, I didn't have any idea how to implement the tee-ing, but I thought I could try if I can implement a toggle to bring back the old behavior. Apparently, that exists with stdout = NA although I haven't yet made it work. I'll try further.

HenrikBengtsson commented 5 years ago

Apparently, that exists with stdout = NA although I haven't yet made it work. I'll try further.

Note that this is only for stdout output. Messages and warnings can be disable similarly using conditions = character(0).

What you cannot disable is capturing of errors. This means that error messages do not end up in the batchtools produced log files. The https://github.com/HenrikBengtsson/future/issues/294 issue is also meant to tackle that need.

rubenarslan commented 5 years ago

Sorry, can you explain more how this would work? This does not work

      qsub <- tweak(batchtools_torque, template = template,
                    resources = list(job.name = job_name,
                                     queue = queue,
                                     ncpus = ncpus),
                    conditions = character(0),
                    stdout = NA)
      plan(qsub)

It says

In tweak.future(batchtools_torque, template = template, resources = list(job.name = job_name, : Ignored 2 unknown arguments: ‘conditions’, ‘stdout’

HenrikBengtsson commented 5 years ago

The stdout and conditions argument apply to the future() function.

This also means they are in control of the developer, not the person who only controls the plan(). There is a goal to give that user some control too (splitting/mirroring locally). I updated https://github.com/HenrikBengtsson/future/issues/294#issuecomment-477426168 to clarify this.

HenrikBengtsson commented 4 years ago

Being able to split/tee the standard output and likely also conditions (particularly messages and warnings) is also helpful when debugging in sequential mode. Currently, when entering the browser() debugger while resolving a future expression any stdout is sunk rendering the interactive debugger "useless".

The current workaround is to call sink(NULL) until stdout is no longer captured. The downside is that it'll break whatever the future is capturing.

Transfering this issue to the future issue tracker because it is non-specific to the future.batchtools package

HenrikBengtsson commented 4 years ago

Added support for plan(..., split = TRUE) to the develop branch of future. Here are a few proof-of-concept examples:

Example with a sequential future:

library(future)
plan(sequential, split = TRUE)

> f <- future({ print(1:3); message("Hello world"); warning("boom"); })
[1] 1 2 3
Hello world
Warning message:
In eval(quote({ : boom

> v <- value(f)
[1] 1 2 3
Hello world
Warning message:
In eval(quote({ : boom

Example with batchtools future:

> library(future.batchtools)
> options(future.delete = FALSE)  ## keep batchtools registry folders
> plan(batchtools_local, split = TRUE)
> f <- future({ print(1:3); message("Hello world"); warning("boom") })
> v <- value(f)
[1] 1 2 3
Hello world
Warning message:
In eval(quote({ : boom

## The output can also be found in the batchtools log files
> print(loggedOutput(f))
 [1] "### [bt]: This is batchtools v0.9.13"                                            
 [2] "### [bt]: Starting calculation of 1 jobs"                                        
 [3] "### [bt]: Setting working directory to '/home/hb/repositories/future.batchtools'"
 [4] "### [bt]: Memory measurement disabled"                                           
 [5] "### [bt]: Starting job [batchtools job.id=1]"                                    
 [6] "### [bt]: Setting seed to 15034 ..."                                             
 [7] "[1] 1 2 3"                                                                       
 [8] "Hello world"                                                                     
 [9] "Warning in eval(quote({ : boom"                                                  
[10] ""                                                                                
[11] "### [bt]: Job terminated successfully [batchtools job.id=1]"                     
[12] "### [bt]: Calculation finished!"    

With split = TRUE, we can also debug sequential futures. Previously, the output by browser() would not have been visible because it was captured. Here is how it works now:

library(future)
plan(sequential, split = TRUE)

fcn <- function(x) {
  x <- x^2
  y <- sqrt(x)
  y
}

debug(fcn)
f <- future({ fcn(3) })
debugging in: fcn(3)
debug at #1: {
    x <- x^2
    y <- sqrt(x)
    y
}
Browse[2]> 
debug at #2: x <- x^2
Browse[2]> 
debug at #3: y <- sqrt(x)
Browse[2]> x
[1] 9
Browse[2]> 
debug at #4: y
Browse[2]> y
[1] 3
Browse[2]> c
exiting from: fcn(3)

The downside is that the browser() output from debugging is captured and relayed by the future. That is, we get:

> v <- value(f)
debugging in: fcn(3)
debug at #1: {
    x <- x^2
    y <- sqrt(x)
    y
}
[1] "x"
[1] 3
debug at #2: x <- x^2
[1] 3
debug at #3: y <- sqrt(x)
[1] 9
debug at #4: y
[1] 3
exiting from: fcn(3)
HenrikBengtsson commented 4 years ago

Passed revdep check on 140 packages.