MilesMcBain / friendlyeval

A friendly interface to tidyeval/rlang that will excuse itself when you're done.
Other
107 stars 6 forks source link

Unexpected behaviour when passing string to be evaluated as expression #7

Open leungi opened 6 years ago

leungi commented 6 years ago

Below is a reprex.

The goal of function is to allow custom variables and functions for a group_by() + summarise().

In many tidyeval examples I see, authors used ... as function arguments to to do this; however, I'd like to have an explicit definition of function to be passed to summarise().

If there's a better way to do this, please enlighten me!

library(dplyr)
library(friendlyeval)

## this works
GroupNSummarise <- function(.data, .var, .group, .fun) {
  group <- treat_strings_as_cols(.group)
  func <- treat_string_as_expr(.fun)
  var <- treat_strings_as_cols(.var)

  .data %>% 
    group_by(!!!group) %>% 
    summarise_at(vars(.var), ~ eval(func))
}

mtcars %>% 
  GroupNSummarise(.var = c("mpg", "wt"),
                  .group = c("cyl"),
                  .fun = 'mean(., na.rm = TRUE)')
#> Warning: package 'bindrcpp' was built under R version 3.4.4
#> # A tibble: 3 x 3
#>     cyl   mpg    wt
#>   <dbl> <dbl> <dbl>
#> 1     4  26.7  2.29
#> 2     6  19.7  3.12
#> 3     8  15.1  4.00

## this fails
GroupNSummarise2 <- function(.data, .var, .group, .fun) {
  group <- treat_strings_as_cols(.group)
  func <- treat_string_as_expr(.fun)
  var <- treat_strings_as_cols(.var)

  .data %>% 
    group_by(!!!group) %>% 
    summarise_at(vars(.var), ~ !!(func))
}

mtcars %>% 
  GroupNSummarise2(.var = c("mpg", "wt"),
                  .group = c("cyl"),
                  .fun = 'mean(., na.rm = TRUE)')
#> Error in summarise_impl(.data, dots): Evaluation error: invalid argument type.
MilesMcBain commented 6 years ago

Thanks for this example!

So the first case is pretty interesting to me. At this time I can't explain how it works!

One thing to think about is that to use the . notation with summarise_at you need to end up passing a formula, and a formula is already a way of 'quoting' code, so why not have it appear at the top level? E.g:

GroupNSummarise3 <- function(.data, .var, .group, .fun) {
  group <- treat_strings_as_cols(.group)
  var <- treat_strings_as_cols(.var)

  .data %>% 
    group_by(!!!group) %>% 
    summarise_at(vars(.var), .fun)
}

mtcars %>% 
  GroupNSummarise3(.var = c("mpg", "wt"),
                   .group = c("cyl"),
                   .fun = ~mean(., na.rm = TRUE))

If you absolutely need to construct a formula from a string then this seems to work:

GroupNSummarise4 <- function(.data, .var, .group, .fun) {
  group <- treat_strings_as_cols(.group)
  var <- treat_strings_as_cols(.var)
  fun <- rlang::new_formula(lhs = NULL, 
                            rhs = treat_string_as_expr(.fun))

  .data %>% 
    group_by(!!!group) %>% 
    summarise_at(vars(.var), fun)
}

mtcars %>% 
  GroupNSummarise4(.var = c("mpg", "wt"),
                   .group = c("cyl"),
                   .fun = 'mean(., na.rm = TRUE)')

This could be wrapped up a little more nicely into something like treat_string_as_formula() although I note that when the formula option for a function argument appears the convention is to usually offer it as an alternative to an ordinary closure. So I'm unsure how general this would be.

leungi commented 6 years ago

Thanks for prompt response @MilesMcBain !

I agree most would avoid specifying arguments as strings, so GroupNSummarise3() is the way to go.

Appreciate the useful tip - "a formula is already a way of 'quoting' code" :+1: