atorus-research / Tplyr

https://atorus-research.github.io/Tplyr/
Other
95 stars 16 forks source link

Argument `where` in `group_count` cannot find functions defined by users #154

Closed yangwenghou123 closed 8 months ago

yangwenghou123 commented 8 months ago

Description

when a R script containing Tplyr code is sourced in a dedicated environment, user customized external functions cannot be found. I am not sure if it is a bug.

Steps to Reproduce (Bug Report Only)

following is a R script named test_Tplyr_envir.R

library(magrittr) library(admiral) library(Tplyr)

adsl <- admiral_adsl

fun_missing <- function(x){ is.na(x) | as.character(x) == "" | as.character(x) == " " }

t <- tplyr_table(adsl, TRT01P) %>% add_layer( group_count(AGEGR1, by = "Age categories n (%)", where = !fun_missing(AGEGR1)) %>% set_format_strings(f_str("xx (xx.x%)", n, pct)) %>% add_total_row() )

build(t)

following is another R script to source test_Tplyr_envir.R

source <- function(file, ..., local = nenv){
base::source(file, ... , local = local)
}

path2pgm <- "~/R/test_Tplyr_envir.R" nenv <- new.env()

try(source(path2pgm, echo= TRUE, max.deparse.length = Inf, local = nenv))

Expected behavior: customized function fun_missing can be found

Actual behavior: Caused by error in value[[3L]](): ! group_count where condition !fun_missing(AGEGR1) is invalid. Filter error: Error in filter(): In argument: !fun_missing(AGEGR1). Caused by error in fun_missing(): ! could not find function "fun_missing"

Versions

sessionInfo() R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] Tplyr_1.1.0 admiral_0.8.3 magrittr_2.0.3

loaded via a namespace (and not attached): [1] rstudioapi_0.15.0 knitr_1.43 hms_1.1.3 tidyselect_1.2.0 timechange_0.2.0 R6_2.5.1 rlang_1.1.0
[8] fastmap_1.1.0 fansi_1.0.3 stringr_1.5.0 dplyr_1.1.2 tools_4.2.1 xfun_0.39 utf8_1.2.2
[15] cli_3.4.1 htmltools_0.5.5 yaml_2.3.7 assertthat_0.2.1 digest_0.6.29 tibble_3.2.1 lifecycle_1.0.3
[22] purrr_1.0.1 tidyr_1.3.0 vctrs_0.6.1 glue_1.6.2 evaluate_0.21 admiraldev_0.1.0 rmarkdown_2.23
[29] stringi_1.7.8 compiler_4.2.1 pillar_1.9.0 forcats_1.0.0 generics_0.1.3 lubridate_1.9.2 pkgconfig_2.0.3

mstackhouse commented 8 months ago

The fix for this is actually rather simple - I just want to be cautious of the potential consequences.

The underlying issue is that the tplyr_table() environment doesn't inherit from global, as can be seen here. If I set the parent of the Tplyr table to the global environment, the unit test failures are rather minor and I think inconsequential.

In short this should be a easy fix we can put in the next wave but I want to exercise some caution.

Ok so I dug into this more and this is obscure. The actual reason the function can't be found is because the tplyr_table object is a child environment of a function run within the tplyr namespace. So the parent environment list looks like this:

> rlang::env_parents(t)
[[1]]   <env: 0x556593ec37a8>
[[2]] $ <env: namespace:Tplyr>
[[3]] $ <env: imports:Tplyr>
[[4]] $ <env: namespace:base>
[[5]] $ <env: global>

When you run within a separate dedicated environment like in the example, that environment is a child of global, so when it comes to visibility, the user defined function basically sits parallel to global.

I'm going to set this to won't fix, because I can't see a simple way to update it without risking some potentially really weird side effects. Rather, this is something I'd like to be more cognizant of when building tplyr2

yangwenghou123 commented 8 months ago

Thanks for the investigation. is there any suggestion for workaround except put objects into Global? I noticed function set_precision_data has the similar problem when data frame is created in a function.

tfunc <- function(){
  prec <- tibble::tribble(
    ~vs, ~max_int, ~max_dec,
    0,        1,        1,
    1,        2,        2
  )

  tplyr_table(mtcars, gear) %>%
    add_layer(
      group_desc(wt, by = vs) %>%
        set_format_strings(
          'Mean (SD)' = f_str('a.a+1 (a.a+2)', mean, sd)
        ) %>%
        set_precision_data(prec) %>%
        set_precision_on(wt)
    ) %>%
    build()
}

tfunc()
mstackhouse commented 8 months ago

Thank you! That example led me down the right path and this is actually not nearly the issue I thought it was. #161 is in as a fix.