futureverse / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
955 stars 85 forks source link

WISH: Add message when future is resolved #31

Closed russellpierce closed 8 years ago

russellpierce commented 8 years ago

Before I discovered future I wrote mcparallelDo. For almost every purpose I run into, future does a better job of obtaining the results I was seeking when I wrote mcparallelDo. The only feature I think mcparallelDo has that is (seemingly) absent from future is embodied in the verbose argument, i.e. the ability to notify a user in an interactive session when a multicore evaluation has completed. Personally, I find that feature awfully convenient.

Just glancing around briefly, it seems like registering a callback handler that watches resolved could work or perhaps something could accomplish that in the FutureRegistry?

HenrikBengtsson commented 8 years ago

So, in order to know if a future is resolved, we have to poll it. I doubt we can have push functionalities for this that works in general. Remember that the future package is designed to work also with futures of other types than the built-in eager, lazy and multicore futures. For instance, the BatchJobs package can run jobs on various types of clusters and there will be futures for those cases as well. In order to know whether a cluster job/future is completed or not, we need to poll the job. There are no mechanism for jobs/futures to "call back". It might be that this is possible for futures running in the same session (i.e. eager, lazy and multicore), but otherwise we cannot assume this.

Could this "done" message (or callback) simply be done at the end of the future expression? For example:

x %<= {
   # Processing
   message("DONE")
   value
}

This would work with any type of futures, also the ones that are evaluated on a cluster (but there the output won't reach the main session).

FYI, Issue #30 for adding resolved() function to query/pooling multiple futures at once is somewhat related to this.

russellpierce commented 8 years ago

You're right, any solution like that would have to be future-type specific. That seems like a reasonable reason to close this WISH.

In regards to your example, it doesn't seem to work for plan(multicore) perhaps for similar reasons as to why multicore doesn't pass forward warning messages... and even if it did I suspect it would work as-in plan lazy, i.e., only telling you it was done when you queried the target object, which doesn't quite meet the desired behavior.

HenrikBengtsson commented 8 years ago

Output to stderr and stdout with multicore futures should work, because multicore utilized parallel::mcparallel(..., silent=FALSE). In other words, both stdout and stderr should just be sent to ditto of the main process/fork.

Example:

> library("future")

> plan(eager)
> x %<=% { cat("Hello\n"); message("DONE"); TRUE }
Hello
DONE
> x
[1] TRUE
>
> plan(lazy)
> x %<=% { cat("Hello\n"); message("DONE"); TRUE }
> x
Hello
DONE
[1] TRUE
>
> plan(multicore)
> x %<=% { cat("Hello\n"); message("DONE"); TRUE }
> Hello
DONE

> x
[1] TRUE

One could imagine a argument/option that appends a call to hook expression to the end of each future expression. For instance,

> plan(multicore, onDone=quote({ message("DONE") }))
> x %<=% { TRUE }
> DONE

> x
[1] TRUE

That would effectively be as if you would manually used:

x %<=% { value <- { TRUE } ## Original future expression { message("DONE") } value }

However, before adding such bells and whistles I really want the core API to be up and running so we have a good understanding what the stable API should look like and how futures show behave.  If we discover a wish feature being incompatible for some reason, it's costly to remove.  In the meanwhile the above should be very easy to achieve by defining your own future, e.g.
```r
multicore2 <- function(expr, envir=parent.frame(), onDone=NULL, substitute=TRUE, ...) {
 if (substitute) expr <- substitute(expr)
 if (!is.null(onDone)) {
   expr <- substitute({ value <- a; b; value }, list(a=expr, b=onDone))
 }
 multicore(expr, envir=envir, substitute=FALSE, ...)
}

and

plan(multicore2)
x %<=% { TRUE }
x
[1] TRUE

plan(multicore2, onDone=quote({ message("DONE") }))
x %<=% { TRUE }
DONE
x
[1] TRUE
russellpierce commented 8 years ago

Interesting (but perhaps unsurprisingly), RStudio (.99.489) seems to be doing something untoward here and swallowing both the cat and the message from the forked process. Maybe it is something to do with how rsession talks back to rstudio (not that one should really be using forks on RStudio to begin with given the dire warnings regarding forks and GUIs). When I run from the console directly, I do see the behavior you describe. Thanks for specifying a complete example.

HenrikBengtsson commented 8 years ago

Thanks for the updates on RStudio - good to know. I'm hardly ever using it and when I do, I only have it available on Windows so I cannot check the multicore behavior myself.

Closing issue since there is an easy workaround for now.