HenrikBengtsson / progressr

三 R package: An Inclusive, Unifying API for Progress Updates
https://progressr.futureverse.org
281 stars 12 forks source link

Creating a progressor object outside `with_progress()`? #83

Open DavisVaughan opened 4 years ago

DavisVaughan commented 4 years ago

So I've deprecated .progress in furrr, and was planning on writing a vignette on using progressr.

My first thought was to create an example like the one below, but it errors. It seems like the progressor() call must be called from within with_progress(). Is that right? I tried to look around for documentation about this but I can't find anything, maybe I'm just missing it.

I know there is a developer/user separation of the APIs, but as of right now I think that this is how users might want to construct progress enabled functions to pass to future_map(). Does this seem reasonable?

library(progressr)
#> Warning: package 'progressr' was built under R version 4.0.2
library(purrr)

set.seed(123)

x <- replicate(n = 10, runif(20), simplify = FALSE)
p <- progressor(steps = length(x))

slow_fn <- function(x) {
  p()
  Sys.sleep(.1)
  sum(x)
}

with_progress({
  map(x, slow_fn)
})
#> Error in error("length(timestamp) == 0L") : 
#>   .validate_internal_state(‘handler(type=update) ... end’): length(timestamp) == 0L
#> Error in error("length(timestamp) == 0L") : 
#>   .validate_internal_state(‘reporter_args() ... begin’): length(timestamp) == 0L
DavisVaughan commented 4 years ago

Upon more study, this works with initiate = FALSE.

It requires setting steps (which is reasonable), but also requires setting auto_finish. I wonder if that could default to TRUE somewhere around here? https://github.com/HenrikBengtsson/progressr/blob/470b94b5760c78a9ec7973eb0fb69c850f18477e/R/make_progression_handler.R#L318

library(progressr)
library(purrr)

set.seed(123)

x <- replicate(n = 10, runif(20), simplify = FALSE)
p <- progressor(initiate = FALSE)

slow_fn <- function(x) {
  p()
  Sys.sleep(.1)
  sum(x)
}

with_progress({
  p(type = "initiate", steps = length(x), auto_finish = TRUE)
  purrr::map(x, slow_fn)
})
HenrikBengtsson commented 4 years ago

Yes something like that. Just for my clarification, you second example is just a proof of concept correct; I don't see an advantage of:

with_progress({
  p(type = "initiate", steps = length(x), auto_finish = TRUE)
  purrr::map(x, slow_fn)
})

over

with_progress({
  p <- progressor(along=x)
  purrr::map(x, slow_fn)
})

Or are you arguing for a different coding style rather than verbosity?

PS. Thxs for all your recent work everywhere; I just came out of a long sprint, need to slow down a bit, and catch up with backlogs at work.

DavisVaughan commented 4 years ago

I think you are right, I was just thinking about this example below from the perspective of a user. When writing slow_fn(), I'd be "using" p() before I ever create it, which might be somewhat confusing. Then I create p later on in the with_progress() expression. It just requires a bit of lexical scoping knowledge, but I think the code itself is pretty clean.

I'm working on the vignette now. I'm planning on using the style below, plus an explanation about why you generally need to create p inside of the with_progress() call (because it signals that first "initiate" condition on creation). I might still include one example about how to delay that "initiate" signal, but I won't suggest it as best practice.

x <- replicate(n = 10, runif(20), simplify = FALSE)

slow_fn <- function(x) {
  p()
  Sys.sleep(.1)
  sum(x)
}

with_progress({
  p <- progressor(along = x)
  purrr::map(x, slow_fn)
})

I just came out of a long sprint

No rush! I understand, take your time.

DavisVaughan commented 4 years ago

Oh actually I don't think that creating slow_fn() ahead of time is a viable solution. The following example won't work at all (ignoring the fact that with_progress() currently returns NULL, I think it should return the value of expr as suggested in #26 so you can use it in dplyr). The function env of fn() here is the global env. p is created in some dplyr environment, so fn() can't find it when it tries to look for p in the global env.

library(dplyr)
library(progressr)
library(tidyr)

df <- tibble(
  g = sort(sample(10, 100, TRUE)),
  x = runif(100)
)
df <- chop(df, x)
head(df, n = 2)
#> # A tibble: 2 x 2
#>       g           x
#>   <int> <list<dbl>>
#> 1     1        [11]
#> 2     2        [13]

fn <- function(x) {
  p()
  sum(x)
}

df %>%
  mutate(
    y = with_progress({
      p <- progressor(steps = length(x))
      purrr::map(x, fn)
    })
  )
#> Error: Problem with `mutate()` input `y`.
#> x could not find function "p"
#> ℹ Input `y` is `with_progress(...)`.

Created on 2020-08-07 by the reprex package (v0.3.0)

HenrikBengtsson commented 4 years ago

Just a quick comment:

I might still include one example about how to delay that "initiate" signal, but I won't suggest it as best practice.

Hold off from doing that because it's internal code.

gregleleu commented 3 years ago

To @DavisVaughan 's point, it has more uses cases than just dplyr flows. The below fails as well, and I'm not sure why.

library(progressr)
library(purrr)

xs <- 1:10

my_fun <- function(x) {
    Sys.sleep(1)
    p()
    sqrt(x)
}

top_level_fun <- function(x) {
  p <- progressor(along = x)
  y <- map(xs, my_fun)
  return(y)
}

with_progress({
  y <- top_level_fun(xs)
})
HenrikBengtsson commented 3 years ago

@DavisVaughan, I've finally got time to dive into your example in https://github.com/HenrikBengtsson/progressr/issues/83#issuecomment-670595044. While prototyping getting the evaluation environment (the infamous envir = parent.frame() problem) correct, and looking at this example, I came up with the following minimal reproducible example without bringing in progressr:

$ R --quiet --vanilla
fn <- function(x) { p(); -x }
dplyr::mutate(data.frame(x=1), y = { p <- function() NULL; fn(x) })
## Error: Problem with `mutate()` input `y`.
## ✖ could not find function "p"
## ℹ Input `y` is `{ ... }`.

I think that illustrates what's going on, i.e. I don't think this is something that progressr or with_progress() can solve. Maybe this is what you tried to say and you're saying that it works if p() could be created "outside" as in:

fn <- function(x) { p(); -x }
p <- function() NULL; 
dplyr::mutate(data.frame(x=1), y = { fn(x) })
##   x  y
## 1 1 -1

and that is what you want to do with progressr? (I think this is what you've tried to say all along but not certain)

gregleleu commented 3 years ago

Reading @HenrikBengtsson's answer, the below works, but it's probably against good coding principles? I'm specifying where to look for p

library(progressr)
library(purrr)

xs <- 1:10

my_fun <- function(x) {
    Sys.sleep(1)

    parent.frame(n = 2)$p()
    sqrt(x)
}

top_level_fun <- function(x) {
  p <- progressor(along = x)
  y <- map(xs, my_fun)
  return(y)
}

with_progress({
  y <- top_level_fun(xs)
})
HenrikBengtsson commented 3 years ago

@gregleleu, yes, I think your example is somewhat related. That one boils down to something like:

my_fun <- function(x) {
  p()
  sqrt(x)
}

top_level_fun <- function(x) {
  p <- function() NULL
  my_fun(x)
}

y <- top_level_fun(1:10)
## Error in p() : could not find function "p"
HenrikBengtsson commented 3 years ago

... the below works, but it's probably against good coding principles?

Yeah, that's a hack that I think risks becoming broken if there are changes.

HenrikBengtsson commented 3 years ago

I think the proper way to handle this by passing the progressor function via an argument, e.g.

my_fun <- function(x, p) {
    Sys.sleep(1)
    p()
    sqrt(x)
}

top_level_fun <- function(x) {
  p <- progressr::progressor(along = x)
  y <- purrr::map(xs, my_fun, p = p)
  return(y)
}

so that

y <- top_level_fun(1:5)

works and therefore also:

progressr::with_progress(y <- top_level_fun(1:5))

and upcoming:

progressr::handlers(global=TRUE)
y <- top_level_fun(1:5)
DavisVaughan commented 3 years ago

I think the proper way to handle this by passing the progressor function via an argument, e.g.

The more I think about all this, the more I think that this is the right way to go altogether. It now seems a little too magical to me to have a function like slow_fn() below that can reach "outside itself" and find p() to increment - p should have been an argument.

slow_fn <- function(x) {
  p()
  Sys.sleep(.1)
  sum(x)
}

Going back to my original example, I like adding p as an argument here because:

library(progressr)
library(furrr)
library(purrr)
set.seed(123)
plan(multisession, workers = 2)

# default so `slow_fn()` can be called without progressors
no_progress <- function() {}

x <- replicate(n = 10, runif(20), simplify = FALSE)

slow_fn <- function(x, p = no_progress) {
  p()
  Sys.sleep(.3)
  sum(x)
}

# call slow_fn() without progress
result <- future_map(x, slow_fn)

# call slow_fn() with progress
result <- with_progress({
  p <- progressor(steps = length(x))
  future_map(x, slow_fn, p = p)
})

Similar, with dplyr:

library(dplyr)
library(progressr)
library(tidyr)
library(furrr)
set.seed(123)
plan(multisession, workers = 2)

no_progress <- function() {}

df <- tibble(
  g = sort(sample(10, 100, TRUE)),
  x = runif(100)
)
df <- chop(df, x)
head(df, 2)
#> # A tibble: 2 x 2
#>       g           x
#>   <int> <list<dbl>>
#> 1     1         [6]
#> 2     2         [7]

fn <- function(x, p = no_progress) {
  p()
  Sys.sleep(.5)
  sum(x)
}

# no progress
result <- df %>%
  mutate(y = future_map(x, fn))

# with progress
result <- df %>%
  mutate(
    y = with_progress({
      p <- progressor(steps = n())
      future_map(x, fn, p = p)
    })
  )

The one thing that still doesn't work is a mutate() with grouped data frames. Conceptually I think this should be giving 1 progress bar per group - but we get some kind of locked binding error. My guess is that the ...progressor from the first group isn't getting unlocked before moving on to the second group.

library(dplyr)
library(progressr)
library(tidyr)
library(furrr)
library(vctrs)
set.seed(123)
plan(multisession, workers = 2)

no_progress <- function() {}

df <- tibble(
  g = sort(sample(10, 100, TRUE)),
  x = runif(100)
)
df <- chop(df, x)
df <- vec_rep_each(df, 2) # rep each row 2 times to make groups
head(df, 4)
#> # A tibble: 4 x 2
#>       g           x
#>   <int> <list<dbl>>
#> 1     1         [6]
#> 2     1         [6]
#> 3     2         [7]
#> 4     2         [7]

fn <- function(x, p = no_progress) {
  p()
  Sys.sleep(.5)
  sum(x)
}

# with progress
result <- df %>%
  group_by(g) %>%
  mutate(
    y = with_progress({
      p <- progressor(steps = n())
      future_map(x, fn, p = p)
    })
  )
#> Error: Problem with `mutate()` input `y`.
#> x cannot change value of locked binding for '...progressor'
#> ℹ Input `y` is `with_progress(...)`.
#> ℹ The error occurred in group 2: g = 2.

If we look past the locked binding error, it is possible that I'd want to create 1 p for the entire data frame, not one per group, but I can't do that either right now because progressor() errors when called from the global env level. But it might look like:

p <- progressor(steps = nrow(df))

result <- df %>%
  group_by(g) %>%
  mutate(
    y = with_progress({
      future_map(x, fn, p = p)
    })
  )