Open DavisVaughan opened 4 years ago
Upon more study, this works with initiate = FALSE
.
It requires setting steps
(which is reasonable), but also requires setting auto_finish
. I wonder if that could default to TRUE
somewhere around here? https://github.com/HenrikBengtsson/progressr/blob/470b94b5760c78a9ec7973eb0fb69c850f18477e/R/make_progression_handler.R#L318
library(progressr)
library(purrr)
set.seed(123)
x <- replicate(n = 10, runif(20), simplify = FALSE)
p <- progressor(initiate = FALSE)
slow_fn <- function(x) {
p()
Sys.sleep(.1)
sum(x)
}
with_progress({
p(type = "initiate", steps = length(x), auto_finish = TRUE)
purrr::map(x, slow_fn)
})
Yes something like that. Just for my clarification, you second example is just a proof of concept correct; I don't see an advantage of:
with_progress({
p(type = "initiate", steps = length(x), auto_finish = TRUE)
purrr::map(x, slow_fn)
})
over
with_progress({
p <- progressor(along=x)
purrr::map(x, slow_fn)
})
Or are you arguing for a different coding style rather than verbosity?
PS. Thxs for all your recent work everywhere; I just came out of a long sprint, need to slow down a bit, and catch up with backlogs at work.
I think you are right, I was just thinking about this example below from the perspective of a user. When writing slow_fn()
, I'd be "using" p()
before I ever create it, which might be somewhat confusing. Then I create p
later on in the with_progress()
expression. It just requires a bit of lexical scoping knowledge, but I think the code itself is pretty clean.
I'm working on the vignette now. I'm planning on using the style below, plus an explanation about why you generally need to create p
inside of the with_progress()
call (because it signals that first "initiate"
condition on creation). I might still include one example about how to delay that "initiate"
signal, but I won't suggest it as best practice.
x <- replicate(n = 10, runif(20), simplify = FALSE)
slow_fn <- function(x) {
p()
Sys.sleep(.1)
sum(x)
}
with_progress({
p <- progressor(along = x)
purrr::map(x, slow_fn)
})
I just came out of a long sprint
No rush! I understand, take your time.
Oh actually I don't think that creating slow_fn()
ahead of time is a viable solution. The following example won't work at all (ignoring the fact that with_progress()
currently returns NULL
, I think it should return the value of expr as suggested in #26 so you can use it in dplyr). The function env of fn()
here is the global env. p
is created in some dplyr environment, so fn()
can't find it when it tries to look for p
in the global env.
library(dplyr)
library(progressr)
library(tidyr)
df <- tibble(
g = sort(sample(10, 100, TRUE)),
x = runif(100)
)
df <- chop(df, x)
head(df, n = 2)
#> # A tibble: 2 x 2
#> g x
#> <int> <list<dbl>>
#> 1 1 [11]
#> 2 2 [13]
fn <- function(x) {
p()
sum(x)
}
df %>%
mutate(
y = with_progress({
p <- progressor(steps = length(x))
purrr::map(x, fn)
})
)
#> Error: Problem with `mutate()` input `y`.
#> x could not find function "p"
#> ℹ Input `y` is `with_progress(...)`.
Created on 2020-08-07 by the reprex package (v0.3.0)
Just a quick comment:
I might still include one example about how to delay that "initiate" signal, but I won't suggest it as best practice.
Hold off from doing that because it's internal code.
To @DavisVaughan 's point, it has more uses cases than just dplyr flows. The below fails as well, and I'm not sure why.
library(progressr)
library(purrr)
xs <- 1:10
my_fun <- function(x) {
Sys.sleep(1)
p()
sqrt(x)
}
top_level_fun <- function(x) {
p <- progressor(along = x)
y <- map(xs, my_fun)
return(y)
}
with_progress({
y <- top_level_fun(xs)
})
@DavisVaughan, I've finally got time to dive into your example in https://github.com/HenrikBengtsson/progressr/issues/83#issuecomment-670595044. While prototyping getting the evaluation environment (the infamous envir = parent.frame()
problem) correct, and looking at this example, I came up with the following minimal reproducible example without bringing in progressr:
$ R --quiet --vanilla
fn <- function(x) { p(); -x }
dplyr::mutate(data.frame(x=1), y = { p <- function() NULL; fn(x) })
## Error: Problem with `mutate()` input `y`.
## ✖ could not find function "p"
## ℹ Input `y` is `{ ... }`.
I think that illustrates what's going on, i.e. I don't think this is something that progressr or with_progress()
can solve. Maybe this is what you tried to say and you're saying that it works if p()
could be created "outside" as in:
fn <- function(x) { p(); -x }
p <- function() NULL;
dplyr::mutate(data.frame(x=1), y = { fn(x) })
## x y
## 1 1 -1
and that is what you want to do with progressr? (I think this is what you've tried to say all along but not certain)
Reading @HenrikBengtsson's answer, the below works, but it's probably against good coding principles?
I'm specifying where to look for p
library(progressr)
library(purrr)
xs <- 1:10
my_fun <- function(x) {
Sys.sleep(1)
parent.frame(n = 2)$p()
sqrt(x)
}
top_level_fun <- function(x) {
p <- progressor(along = x)
y <- map(xs, my_fun)
return(y)
}
with_progress({
y <- top_level_fun(xs)
})
@gregleleu, yes, I think your example is somewhat related. That one boils down to something like:
my_fun <- function(x) {
p()
sqrt(x)
}
top_level_fun <- function(x) {
p <- function() NULL
my_fun(x)
}
y <- top_level_fun(1:10)
## Error in p() : could not find function "p"
... the below works, but it's probably against good coding principles?
Yeah, that's a hack that I think risks becoming broken if there are changes.
I think the proper way to handle this by passing the progressor function via an argument, e.g.
my_fun <- function(x, p) {
Sys.sleep(1)
p()
sqrt(x)
}
top_level_fun <- function(x) {
p <- progressr::progressor(along = x)
y <- purrr::map(xs, my_fun, p = p)
return(y)
}
so that
y <- top_level_fun(1:5)
works and therefore also:
progressr::with_progress(y <- top_level_fun(1:5))
and upcoming:
progressr::handlers(global=TRUE)
y <- top_level_fun(1:5)
I think the proper way to handle this by passing the progressor function via an argument, e.g.
The more I think about all this, the more I think that this is the right way to go altogether. It now seems a little too magical to me to have a function like slow_fn()
below that can reach "outside itself" and find p()
to increment - p
should have been an argument.
slow_fn <- function(x) {
p()
Sys.sleep(.1)
sum(x)
}
Going back to my original example, I like adding p
as an argument here because:
We can give it a default that generates no progress to call the function "normally"
It works automatically with furrr / future.apply with no API changes required, which is sort of interesting.
With global handlers we don't need with_progress()
library(progressr)
library(furrr)
library(purrr)
set.seed(123)
plan(multisession, workers = 2)
# default so `slow_fn()` can be called without progressors
no_progress <- function() {}
x <- replicate(n = 10, runif(20), simplify = FALSE)
slow_fn <- function(x, p = no_progress) {
p()
Sys.sleep(.3)
sum(x)
}
# call slow_fn() without progress
result <- future_map(x, slow_fn)
# call slow_fn() with progress
result <- with_progress({
p <- progressor(steps = length(x))
future_map(x, slow_fn, p = p)
})
Similar, with dplyr:
library(dplyr)
library(progressr)
library(tidyr)
library(furrr)
set.seed(123)
plan(multisession, workers = 2)
no_progress <- function() {}
df <- tibble(
g = sort(sample(10, 100, TRUE)),
x = runif(100)
)
df <- chop(df, x)
head(df, 2)
#> # A tibble: 2 x 2
#> g x
#> <int> <list<dbl>>
#> 1 1 [6]
#> 2 2 [7]
fn <- function(x, p = no_progress) {
p()
Sys.sleep(.5)
sum(x)
}
# no progress
result <- df %>%
mutate(y = future_map(x, fn))
# with progress
result <- df %>%
mutate(
y = with_progress({
p <- progressor(steps = n())
future_map(x, fn, p = p)
})
)
The one thing that still doesn't work is a mutate()
with grouped data frames. Conceptually I think this should be giving 1 progress bar per group - but we get some kind of locked binding error. My guess is that the ...progressor
from the first group isn't getting unlocked before moving on to the second group.
library(dplyr)
library(progressr)
library(tidyr)
library(furrr)
library(vctrs)
set.seed(123)
plan(multisession, workers = 2)
no_progress <- function() {}
df <- tibble(
g = sort(sample(10, 100, TRUE)),
x = runif(100)
)
df <- chop(df, x)
df <- vec_rep_each(df, 2) # rep each row 2 times to make groups
head(df, 4)
#> # A tibble: 4 x 2
#> g x
#> <int> <list<dbl>>
#> 1 1 [6]
#> 2 1 [6]
#> 3 2 [7]
#> 4 2 [7]
fn <- function(x, p = no_progress) {
p()
Sys.sleep(.5)
sum(x)
}
# with progress
result <- df %>%
group_by(g) %>%
mutate(
y = with_progress({
p <- progressor(steps = n())
future_map(x, fn, p = p)
})
)
#> Error: Problem with `mutate()` input `y`.
#> x cannot change value of locked binding for '...progressor'
#> ℹ Input `y` is `with_progress(...)`.
#> ℹ The error occurred in group 2: g = 2.
If we look past the locked binding error, it is possible that I'd want to create 1 p
for the entire data frame, not one per group, but I can't do that either right now because progressor()
errors when called from the global env level. But it might look like:
p <- progressor(steps = nrow(df))
result <- df %>%
group_by(g) %>%
mutate(
y = with_progress({
future_map(x, fn, p = p)
})
)
So I've deprecated
.progress
in furrr, and was planning on writing a vignette on using progressr.My first thought was to create an example like the one below, but it errors. It seems like the
progressor()
call must be called from withinwith_progress()
. Is that right? I tried to look around for documentation about this but I can't find anything, maybe I'm just missing it.I know there is a developer/user separation of the APIs, but as of right now I think that this is how users might want to construct progress enabled functions to pass to
future_map()
. Does this seem reasonable?