Closed jstockwin closed 4 years ago
Hi, we have been considering doing something in this area. One thing to keep in mind is that a future can be executed on a different computer (batchjobs, batchtools, etc.). Some ideas:
I wonder if any of those can be incorporated in future proper, the idea to collect progress/status/messages from a background processes seems to be common enough to warrant a standard solution.
Being able to get progress updates from external processes is a reoccuring feature request in R. I agree that it would be very useful feature to have.
I think the best approach for this not to implement this machinery in the future package per se. Instead, I can imagine an "ultimate message passing / progress bar" package that supports many different use cases (in line with what @alexvorobiev suggests; thxs) and exposes a small generic API that can be used by futures and any other parallel frameworks.
A few random thoughts that this "ultimate" package needs to consider:
The above should / could be implemented and designed independently of the Future API, but can be used by futures as (very rough illustration):
pb <- ultimate_progress_bar(via = "zmq")
f <- future({
progress_bar_set(pb, 0)
some_call()
progress_bar_set(pb, 0.25)
some_other_calls()
progress_bar_set(pb, 0.75)
res <- conclude()
progress_bar_set(pb, 1)
res
})
while (!resolved(f)) {
message("Progress: ", progress_bar_get(pb))
Sys.sleep(0.5)
}
Thanks for the replies both! I don't have the knowledge of R to be able to code this kind of thing, but if anybody tries it please keep this issue updated on progress. I agree with @HenrikBengtsson that it probably doesn't belong in the futures
package, and so if you want to close this as "wontfix" then that's fine.
In the special case that the workers are on the same machine (I'm using the multiprocess plan), is there anything that already exists that might make this possible?
You can have a file that is shared by the master process and the future so that the future can communicating back to the master whenever it want and the master can poll / peek at the file whenever it wants.
I suggest letting the file size represent the progress, such that a zero file size is 0% progress and a 100 bytes file is 100% progress. Then the future can append one or more characters to the file updating the progress. By using file.size()
one can query the progress at any time from anywhere, including the master process. Here's an example illustrating the idea:
library("future")
plan(multisession)
## Setup empty progress-bar file
pb_file <- tempfile()
cat("", file = pb_file)
## Launch future
f <- future({
for (ii in 1:100) {
## Update progress
cat(".", file = pb_file, append = TRUE)
## Some more work
Sys.sleep(0.05)
}
Sys.getpid()
})
## Poll progress every 0.1 s until done
while (!resolved(f)) {
## Report on progress
cat(sprintf("\rProgress: %g%%", file.size(pb_file)), file = stderr())
Sys.sleep(0.1)
}
cat("\n", file = stderr())
## Done
v <- value(f)
print(v)
I'd imagine you could do something like that while()
in Shiny too, e.g. running some function every x seconds updating the display.
FYI, the FileProgressBar
class of the R.utils package provides a progress-bar API for the above idea. But it might be as easy to implement is using bare-bone R code as the above example does.
Just a note: The processx
package (only on GitHub right now) provides a nice mechanism for launching and interactive with background processes on the current machine. One feature it has is to poll the stdout and stderr of such background processes. I can imagine a future.parallelx
package that would utilize these features for what's needed here. However, it's important to come up with a natural API for reading stdout / stderr of futures (and I'm not yet sure what the conceptual model should be).
UPDATE: Progress updates can now be signaled in a near-live fashion when using the progressr package. The gist is:
slow <- function(x) { Sys.sleep(1.0); sqrt(x) }
snail <- function(x) {
p <- progressor(along=x)
y <- future_sapply(x, function(z) {
p(paste0("z=", z, " by ", Sys.getpid()))
slow(z)
})
sum(y)
}
where the user can get progress updates with:
> library(future)
> library(progressr)
> plan(multisession)
> y <- with_progress(snail(x))
For my code snippet in https://github.com/HenrikBengtsson/future/issues/141#issuecomment-292743988 we would have to do something like:
library(future)
plan(multisession, workers = 2L)
library(progressr)
handlers("txtprogressbar")
## Listing and report on progress updates
with_progress({
## Create progressor
p <- progressr::progressor(100L)
## Launch future
f <- future({
for (ii in 1:100) {
p() ## signal progress
## Some more work
Sys.sleep(0.05)
}
Sys.getpid()
})
## Poll progress every 0.1 s until done
while (!resolved(f)) {
Sys.sleep(0.1)
}
})
## Done
v <- value(f)
print(v)
I'm closing since progressr implements a solution to the part of near-live progress updates with optional message strings, e.g. progress(msg)
. Additionally, one use progress(msg, class = "sticky")
to indicate that the progress handler should make them stick, if supported, e.g. if a progress bar in the terminal is used, as "sticky" message will be pushed above the progress bar and stay there permanently just like a regular message. Sticky messages are supported since progressr 0.6.0 (2020-05-18).
I was wondering if this is possible. I have a long running function, which when run normally outputs its progress to the console. (It's running MCMC and just outputs every k iterations to say it's got that far).
When not using future, I think this means it would be possible to capture the output and update a progress bar whenever this happens (by cleverly interpreting the output).
I was wondering if there would be any way to implement this within a future? When calling the function within a future, there is obviously no output to my console (which is fine). I realise you'd probably have to poll at some interval to be able to do this, but that's fine. In my specific use case, it would be sufficient to poll every second to obtain the most recent output from the function (I'm already polling the function every second to see if it's done). I would then use this to update a progress bar in my R-shiny app.