Closed etiennebacher closed 4 months ago
2. if two packages have the same function (e.g
dplyr::lag()
andstats::lag()
) then I have to favor one or handle arguments in convoluted way
IIUC, within arrow_dplyr_query it is not recognized which package the function came from.
There are simply two functions registered, for example foo::bar
and bar
.
See apache/arrow#13160
Apparently arrow
doesn't detect some cases when a function is masked by a package:
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
df <- data.frame(x = as.Date("2020-01-01"))
mt <- arrow_table(df, as_data_frame = FALSE)
# Rightfully errors since data.table::quarter() only has arg "x"
df |>
mutate(dt = quarter(x, fiscal_start = 2))
#> Error in `mutate()`:
#> ℹ In argument: `dt = quarter(x, fiscal_start = 2)`.
#> Caused by error in `quarter()`:
#> ! unused argument (fiscal_start = 2)
mt |>
mutate(dt = quarter(x, fiscal_start = 2)) |>
collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! Invalid: Function 'quarter' accepts 1 arguments but 2 passed
library(lubridate, warn.conflicts = FALSE)
# ideally shouldn't error because lubridate::quarter() now masks data.table::quarter()
df |>
mutate(lub = quarter(x, fiscal_start = 2))
#> x lub
#> 1 2020-01-01 4
mt |>
mutate(lub = quarter(x, fiscal_start = 2)) |>
collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! Invalid: Function 'quarter' accepts 1 arguments but 2 passed
From reading the source code, it appears that the quarter
function is mapped directly to the libarrow compute kernel's quarter
function in arrow_dplyr_query to begin with, and takes only one argument.
Currently
tidypolars
translate functions by prefixing the function name withpl_
, which has 2 limitations:dplyr::
) in expressionsdplyr::lag()
andstats::lag()
) then I have to favor one or handle arguments in convoluted wayOne solution could be to do like
arrow
(based on quick glance at their internals) and populate an environment that they call.cache
.Edit: actually an environment may not be required, I just need to extract the info on which namespace a function comes from and then have functions like
pl_dplyr_lag
andpl_stats_lag
: