TimTeaFan / dplyover

Create columns by applying functions to vectors and/or columns in 'dplyr'.
https://timteafan.github.io/dplyover/
Other
60 stars 1 forks source link

Add new functions `fold` and `fold_over` #2

Open TimTeaFan opened 3 years ago

TimTeaFan commented 3 years ago

Based on this gist fold and fold_over might be useful add on functions for a future version of dplyover. There should be a better name than fold for this kind of functions.

likert_col <- function(n = 10) {
  sample(7, size = 10, replace = TRUE)
}

# toy data
dat <- tibble(
  cat_1 = likert_col(),
  cat_2 = likert_col(),
  cat_3 = likert_col(),
  dog_1 = likert_col(),
  dog_2 = likert_col()
)

# `fold` does not exist yet
dat %>% 
  transmute(fold(starts_with("cat"),
                 list(sum = ~ rowSums(.x),
                      mean = ~ rowMeans(.x))))

# A tibble: 10 x 2
   cat_sum cat_mean
     <dbl>    <dbl>
 1      11     3.67
 2      10     3.33
 3       6     2   
 4       4     1.33
 5      10     3.33
 6       7     2.33
 7      12     4   
 8      12     4   
 9      17     5.67
10      13     4.33

# `fold_over` does not exist yet
dat %>% 
  transmute(fold_over(cut_names("_[0-9]*$"),
                      ~ starts_with(.x),
                      ~ rowSums(.x)))

# A tibble: 10 x 2
     cat   dog
   <dbl> <dbl>
 1    11    11
 2    10    10
 3     6     6
 4     4     4
 5    10    10
 6     7     7
 7    12    12
 8    12    12
 9    17    17
10    13    13
TimTeaFan commented 3 years ago

I think fold would be a great extension of {dplyover}, but a better name should be found given that {rsample} uses vfold and {furrr} has also a fold function.

Then again, fold does pretty much what it says. It folds down several columns of a data.frame to one column, for example by calculating the rowMean.

vorpalvorpal commented 3 years ago

Firstly, thanks for the package. I think this has a far more common use case than Hadley suggested.

Secondly, maybe I'm misunderstanding the purpose of fold here, but wouldn't

summarise(over(starts_with("cat"),
                 list(sum = ~ rowSums(.x),
                      mean = ~ rowMeans(.x))))

do the same thing? At least that way you avoid using the name "fold".

TimTeaFan commented 3 years ago

Thank you for your feedback! Unfortunately over and the other functions in the over-across function family don't work like that. over loops over a vector and creates a new column for each element. Apart from that over does not support tidy-select syntax in its .x argument.

However, we could create a named list of data.frames on the fly as input to over and then produce a similar outcome. Having a dedicated function like fold and fold_over would still be helpful I guess, since we wouldn't need to use one or several select calls as input to over.

# instead of fold_over we could do:
dat %>% 
  summarise(over(list(cat = select(., starts_with("cat")),
                      dog = select(., starts_with("dog"))),
                 list(sum  = rowSums,
                      mean = rowMeans)))

#> # A tibble: 10 x 4
#>    cat_sum cat_mean dog_sum dog_mean
#>      <dbl>    <dbl>   <dbl>    <dbl>
#>  1      12     4         12      6  
#>  2      11     3.67       3      1.5
#>  3      19     6.33       4      2  
#>  4       6     2          9      4.5
#>  5       9     3         14      7  
#>  6       4     1.33       7      3.5
#>  7       7     2.33      10      5  
#>  8       8     2.67       3      1.5
#>  9       9     3          9      4.5
#> 10      10     3.33       7      3.5

Created on 2021-08-19 by the reprex package (v0.3.0)