etiennebacher / tidypolars

Get the power of polars with the syntax of the tidyverse
https://tidypolars.etiennebacher.com
Other
172 stars 3 forks source link

grouping + `everything()` does not seem working correctorly #65

Closed eitsupi closed 10 months ago

eitsupi commented 10 months ago

everything should select everything but except current groups.

library(dplyr, warn.conflicts = FALSE)
library(tidypolars, warn.conflicts = FALSE)
#> Registered S3 method overwritten by 'tidypolars':
#>   method          from
#>   print.DataFrame polars

polars::as_polars_lf(mtcars) |>
  summarise(across(.fns = mean), .by = "cyl") |>
  collect()
#> Error in `tidyselect_named_arg()`:
#> ! Can't subset columns that don't exist.
#> ✖ Column `mean` doesn't exist.
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::collect(...)
#>   2. ├─dplyr::summarise(...)
#>   3. ├─tidypolars:::summarise.LazyFrame(...)
#>   4. │ └─tidypolars:::translate_dots(.data = .data, ..., env = rlang::current_env())
#>   5. │   └─base::lapply(...)
#>   6. │     └─tidypolars (local) FUN(X[[i]], ...)
#>   7. │       └─tidypolars:::translate_expr(...)
#>   8. │         └─tidypolars:::unpack_across(.data, expr)
#>   9. │           └─tidypolars:::tidyselect_named_arg(.data, enquo(.cols))
#>  10. │             └─tidyselect::eval_select(cols, data = data)
#>  11. │               └─tidyselect:::eval_select_impl(...)
#>  12. │                 ├─tidyselect:::with_subscript_errors(...)
#>  13. │                 │ └─rlang::try_fetch(...)
#>  14. │                 │   └─base::withCallingHandlers(...)
#>  15. │                 └─tidyselect:::vars_select_eval(...)
#>  16. │                   └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
#>  17. │                     └─tidyselect:::as_indices_sel_impl(...)
#>  18. │                       └─tidyselect:::as_indices_impl(...)
#>  19. │                         └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
#>  20. │                           └─vctrs::vec_as_location(...)
#>  21. └─vctrs (local) `<fn>`()
#>  22.   └─vctrs:::stop_subscript_oob(...)
#>  23.     └─vctrs:::stop_subscript(...)
#>  24.       └─rlang::abort(...)

Created on 2023-11-30 with reprex v2.0.2

etiennebacher commented 10 months ago

Thanks @eitsupi, there are actually two things in your example.

Using across() without specifying .cols

This behavior was deprecated in dplyr 1.1.0 so I now throw an error when this is the case:

library(dplyr, warn.conflicts = FALSE)
library(tidypolars, warn.conflicts = FALSE)
#> Registered S3 method overwritten by 'tidypolars':
#>   method          from  
#>   print.DataFrame polars

mtcars |> 
  head(n = 1) |> 
  mutate(across(.fns = mean))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(.fns = mean)`.
#> Caused by warning:
#> ! Using `across()` without supplying `.cols` was deprecated in dplyr 1.1.0.
#> ℹ Please supply `.cols` instead.
#>           mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

mtcars |> 
  head(n = 1) |> 
  as_polars() |> 
  mutate(across(.fns = mean))
#> Error in `mutate()`:
#> ! You must supply the argument `.cols` in `across()`.
#> Backtrace:
#>     ▆
#>  1. ├─dplyr::mutate(as_polars(head(mtcars, n = 1)), across(.fns = mean))
#>  2. └─tidypolars:::mutate.DataFrame(as_polars(head(mtcars, n = 1)), across(.fns = mean))
#>  3.   └─tidypolars:::translate_dots(.data = .data, ..., env = rlang::current_env()) at tidypolars/R/mutate.R:82:3
#>  4.     └─base::lapply(...) at tidypolars/R/utils-expr.R:6:3
#>  5.       └─tidypolars (local) FUN(X[[i]], ...)
#>  6.         └─tidypolars:::translate_expr(...) at tidypolars/R/utils-expr.R:7:5
#>  7.           └─tidypolars:::unpack_across(.data, expr, env) at tidypolars/R/utils-expr.R:51:5
#>  8.             └─tidypolars:::get_arg(".cols", 1, expr, env) at tidypolars/R/utils-across.R:5:3
#>  9.               └─rlang::abort(...) at tidypolars/R/utils-across.R:88:5

Remove the groups from the everything() selection

This was a bug but didn't actually appear in your example because of the wrong behavior (now fixed) of across() when .cols is not provided:

library(dplyr, warn.conflicts = FALSE)
library(tidypolars, warn.conflicts = FALSE)
#> Registered S3 method overwritten by 'tidypolars':
#>   method          from  
#>   print.DataFrame polars

mtcars |> 
  head(n = 5) |> 
  summarize(across(everything(), .fns = mean), .by = "cyl")
#>   cyl      mpg     disp  hp     drat       wt  qsec        vs        am
#> 1   6 21.13333 192.6667 110 3.626667 2.903333 17.64 0.3333333 0.6666667
#> 2   4 22.80000 108.0000  93 3.850000 2.320000 18.61 1.0000000 1.0000000
#> 3   8 18.70000 360.0000 175 3.150000 3.440000 17.02 0.0000000 0.0000000
#>       gear carb
#> 1 3.666667    3
#> 2 4.000000    1
#> 3 3.000000    2

mtcars |> 
  head(n = 5) |> 
  as_polars() |> 
  summarize(across(everything(), .fns = mean), .by = "cyl")
#> shape: (3, 11)
#> ┌─────┬───────────┬────────────┬───────┬───┬──────────┬──────────┬──────────┬──────┐
#> │ cyl ┆ mpg       ┆ disp       ┆ hp    ┆ … ┆ vs       ┆ am       ┆ gear     ┆ carb │
#> │ --- ┆ ---       ┆ ---        ┆ ---   ┆   ┆ ---      ┆ ---      ┆ ---      ┆ ---  │
#> │ f64 ┆ f64       ┆ f64        ┆ f64   ┆   ┆ f64      ┆ f64      ┆ f64      ┆ f64  │
#> ╞═════╪═══════════╪════════════╪═══════╪═══╪══════════╪══════════╪══════════╪══════╡
#> │ 4.0 ┆ 22.8      ┆ 108.0      ┆ 93.0  ┆ … ┆ 1.0      ┆ 1.0      ┆ 4.0      ┆ 1.0  │
#> │ 6.0 ┆ 21.133333 ┆ 192.666667 ┆ 110.0 ┆ … ┆ 0.333333 ┆ 0.666667 ┆ 3.666667 ┆ 3.0  │
#> │ 8.0 ┆ 18.7      ┆ 360.0      ┆ 175.0 ┆ … ┆ 0.0      ┆ 0.0      ┆ 3.0      ┆ 2.0  │
#> └─────┴───────────┴────────────┴───────┴───┴──────────┴──────────┴──────────┴──────┘

Can you try with the development version and reopen if you still have this kind of issue?

eitsupi commented 10 months ago

Thanks for quick update! Looks fine.