Open TimTeaFan opened 3 years ago
I think fold
would be a great extension of {dplyover}, but a better name should be found given that {rsample} uses vfold
and {furrr} has also a fold
function.
Then again, fold
does pretty much what it says. It folds down several columns of a data.frame
to one column, for example by calculating the rowMean
.
Firstly, thanks for the package. I think this has a far more common use case than Hadley suggested.
Secondly, maybe I'm misunderstanding the purpose of fold here, but wouldn't
summarise(over(starts_with("cat"),
list(sum = ~ rowSums(.x),
mean = ~ rowMeans(.x))))
do the same thing? At least that way you avoid using the name "fold".
Thank you for your feedback! Unfortunately over
and the other functions in the over-across function family don't work like that. over
loops over a vector and creates a new column for each element. Apart from that over
does not support tidy-select
syntax in its .x
argument.
However, we could create a named list of data.frame
s on the fly as input to over
and then produce a similar outcome. Having a dedicated function like fold
and fold_over
would still be helpful I guess, since we wouldn't need to use one or several select
calls as input to over
.
# instead of fold_over we could do:
dat %>%
summarise(over(list(cat = select(., starts_with("cat")),
dog = select(., starts_with("dog"))),
list(sum = rowSums,
mean = rowMeans)))
#> # A tibble: 10 x 4
#> cat_sum cat_mean dog_sum dog_mean
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12 4 12 6
#> 2 11 3.67 3 1.5
#> 3 19 6.33 4 2
#> 4 6 2 9 4.5
#> 5 9 3 14 7
#> 6 4 1.33 7 3.5
#> 7 7 2.33 10 5
#> 8 8 2.67 3 1.5
#> 9 9 3 9 4.5
#> 10 10 3.33 7 3.5
Created on 2021-08-19 by the reprex package (v0.3.0)
Based on this gist
fold
andfold_over
might be useful add on functions for a future version of dplyover. There should be a better name thanfold
for this kind of functions.