SebKrantz / collapse

Advanced and Fast Data Transformation in R
https://sebkrantz.github.io/collapse/
Other
656 stars 35 forks source link

Qualified calls and namespace resolution for `across` and related functions #621

Closed alinacherkas closed 2 months ago

alinacherkas commented 2 months ago

Describe the bug

Consider a use case where functions from collapse are invoked using qualified calls, i.e., collapse::function. This is the preferred approach if one writes another package or just wants to keep the environment clean. Now, taking an example from across:

Scenario 1: Attach the package

This works:

library(collapse)
fsummarise(wlddev, across(PCGDP:GINI, fmean, w = POP))
#>      PCGDP   LIFEEX     GINI
#> 1 7956.238 65.88068 39.52428

Scenario 2: Qualified calls

This doesn't work:

collapse::fsummarise(collapse::wlddev, collapse::across(PCGDP:GINI, collapse::fmean, w = POP))
#> Error: 'across' is not an exported object from 'namespace:collapse'

Scenario 3: Qualified calls after exporting across

Let's modify NAMESPACE to add export(across) and rebuild collapse. This still doesn't work but the error is different:

collapse::fsummarise(collapse::wlddev, collapse::across(PCGDP:GINI, collapse::fmean, w = POP))
#> Error in collapse::across(PCGDP:GINI, collapse::fmean, w = POP) : 
#>  across() can only work inside fmutate() and fsummarise()

This is expected because throwing an error is, in fact, all that collapse::across does. Looking into the definition of fsummarise, one can notice that the heavy lifting is actually done by do_across, and across is just a call name. Enter scenario 4.

Scenario 4: Qualified calls after replacing across with a dummy

This doesn't work, but the error is now related to fmean:

# asign a different function the same name
across <- dplyr::across
collapse::fsummarise(collapse::wlddev, across(PCGDP:GINI, collapse::fmean, w = POP))
#> Error in fmean(.data_, w = POP, drop = FALSE) : 
#>  could not find function "fmean"

So slightly modifying the above, one obtains a working solution:

across <- dplyr::across
collapse::fsummarise(collapse::wlddev, across(PCGDP:GINI, collapse::fmean.default, w = POP))
#>      PCGDP   LIFEEX     GINI
#> 1 7956.238 65.88068 39.52428

In fact, defining across is not even needed:

# clean the environment
across
#> Error: object 'across' not found
# this still works
collapse::fsummarise(collapse::wlddev, across(PCGDP:GINI, collapse::fmean.default, w = POP))
#>      PCGDP   LIFEEX     GINI
#> 1 7956.238 65.88068 39.52428

Questions

  1. Is this the expected behaviour?

In dplyr, for example, you can do both:

dplyr::summarise(collapse::wlddev, across(PCGDP:GINI, ~ weighted.mean(.x, w = POP, na.rm = T)))
#>      PCGDP LIFEEX     GINI
#> 1 7956.238     NA 39.52428
dplyr::summarise(collapse::wlddev, dplyr::across(PCGDP:GINI, ~ weighted.mean(.x, w = POP, na.rm = T)))
#>      PCGDP LIFEEX     GINI
#> 1 7956.238     NA 39.52428

This is possible because they check the calling environment in a different way.

  1. What is the preferred way to perform the above calculation without attaching the package?

While this works:

collapse::fsummarise(collapse::wlddev, across(PCGDP:GINI, collapse::fmean.default, w = POP))

Not qualifying across seems odd.

  1. What is the recommended way to call fmean and related functions? As shown above, collapse::fmean does not properly resolve.

Steps/Code to Reproduce

collapse::fsummarise(collapse::wlddev, collapse::across(PCGDP:GINI, collapse::fmean, w = POP))
#> Error: 'across' is not an exported object from 'namespace:collapse'

Expected Results

collapse::fsummarise(collapse::wlddev, collapse::across(PCGDP:GINI, collapse::fmean, w = POP))
#>      PCGDP   LIFEEX     GINI
#> 1 7956.238 65.88068 39.52428

Actual Results

collapse::fsummarise(collapse::wlddev, collapse::across(PCGDP:GINI, collapse::fmean, w = POP))
#> Error: 'across' is not an exported object from 'namespace:collapse'

If exported as is:

collapse::fsummarise(collapse::wlddev, collapse::across(PCGDP:GINI, collapse::fmean, w = POP))
#> Error in collapse::across(PCGDP:GINI, collapse::fmean, w = POP) : 
#>  across() can only work inside fmutate() and fsummarise()

Session Info

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: ...
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.3.2    parallel_4.3.2    tools_4.3.2       rstudioapi_0.16.0 Rcpp_1.0.13      
[6] collapse_2.0.15   renv_1.0.3 

P.S. Thank you for this useful and performant package!

SebKrantz commented 2 months ago

Thanks, I'll see what I can do about fmean. across() is not exported because this would give a namespace conflict with dplyr. I had though about calling it facross(), but decided against it because it is unnecessary to use qualified names in this case, although I of course understand that this comes as a surprise.

SebKrantz commented 2 months ago

This is now fixed.