TimTeaFan / dplyover

Create columns by applying functions to vectors and/or columns in 'dplyr'.
https://timteafan.github.io/dplyover/
Other
61 stars 0 forks source link

rename `.()` (eval_string) function #8

Open TimTeaFan opened 3 years ago

TimTeaFan commented 3 years ago

When I started working on {dplyover} over was supposed to be the main function. To solve problems where two sets of columns are used to compute new columns the helper function .() was introduced to evaluate an interpolated string as symbol as in the example below:

library(dplyr)
library(dplyover)
iris <- as_tibble(iris)

iris %>%
  mutate(over(c("Sepal", "Petal"),
              ~ .("{.x}.Width") + .("{.x}.Length")
              ))
#> # A tibble: 150 x 7
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal Petal
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl> <dbl>
#> 1          5.1         3.5          1.4         0.2 setosa    8.6  1.60
#> 2          4.9         3            1.4         0.2 setosa    7.9  1.60
#> 3          4.7         3.2          1.3         0.2 setosa    7.9  1.5 
#> 4          4.6         3.1          1.5         0.2 setosa    7.7  1.7 
#> # ... with 146 more rows

After introducing across2 the above notation seems superfluous:

iris %>%
    mutate(across2(ends_with("Width"),
                   ends_with("Length"),
                   ~ .x + .y,
                   .names = "{pre}"))
# A tibble: 150 x 7
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal Petal
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl> <dbl>
#> 1          5.1         3.5          1.4         0.2 setosa    8.6   1.6
#> 2          4.9         3            1.4         0.2 setosa    7.9   1.6
#> 3          4.7         3.2          1.3         0.2 setosa    7.9   1.5
#> 4          4.6         3.1          1.5         0.2 setosa    7.7   1.7
#> # ... with 146 more rows

I wonder if we still need the functionality of .(). After introducing across2 the only viable use cases are those where more than two sets of columns are used to calculate a variable. These cases seem to be quite uncommon. Then again, {dplyover} is aimed at those data wrangling tasks which are hard to implement using only {dplyr}.

If we keep the functionality of .() then it should either be renamed (for example to ..()) or it should not be exported to avoid namespace conflicts.

At the moment .() is used inside {data.table} and . (as a non-function object) is used in {magrittr}. Both use cases cause no conflicts with dplyover::.(), since {magrittr}'s . refers not to a function and data.table::.() is not an exported function. To avoid conflicts with other packages it would be great if dplyover::.() could be internalized as well (similar to the corresponding {data.table} function). However, a first try to define dplyover::.() as an internal function did not work, since the users call it from the global environment. It seems that if it is not findable on the searchpath, then it will through an error once dplyr::mutate parses all arguments.

Renaming is probably the easiest way to prevent namespace conflicts.