OHDSI / CohortDiagnostics

An R package for performing various cohort diagnostics.
https://ohdsi.github.io/CohortDiagnostics
41 stars 48 forks source link

Use of formals() in CohortDiagnostics::executeDiagnostics #1070

Closed mvankessel-EMC closed 3 days ago

mvankessel-EMC commented 1 year ago

When parsing the arguments of executDiagnostics() here, only default parameters will be passed to variable callingArgsJson.

Here is an example showing this using an example function foo():

foo <- function(bar = 10, baz = 20) {
  args <- formals(foo)
  return(list(bar = args$bar, baz = args$baz))
}

foo()
#> $bar
#> [1] 10
#> 
#> $baz
#> [1] 20

foo(1, 2)
#> $bar
#> [1] 10
#> 
#> $baz
#> [1] 20

Created on 2023-08-18 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16 ucrt) #> os Windows 11 x64 (build 22621) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Dutch_Netherlands.utf8 #> ctype Dutch_Netherlands.utf8 #> tz Europe/Amsterdam #> date 2023-08-18 #> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.1) #> knitr 1.43 2023-05-25 [1] RSPM (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) #> rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.1 2023-06-05 [1] CRAN (R 4.3.1) #> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.1) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.1) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> #> [1] C:/R/R-4.3.1/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

In the following example I've simplified executDiagnostics() definition to only produce json and cut down the variables to only use those that are passed to formals().

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Simplified dummy function definition
executeDiagnostics <- function(runInclusionStatistics = TRUE,
                               runIncludedSourceConcepts = TRUE,
                               runOrphanConcepts = TRUE,
                               runTimeSeries = FALSE,
                               runVisitContext = TRUE,
                               runBreakdownIndexEvents = TRUE,
                               runIncidenceRate = TRUE,
                               runCohortRelationship = TRUE,
                               runTemporalCohortCharacterization = TRUE,
                               minCellCount = 5,
                               minCharacterizationMean = 0.01,
                               incremental = FALSE
                               ) {

  callingArgs <- formals(executeDiagnostics)
  callingArgsJson <-
    list(
      runInclusionStatistics = callingArgs$runInclusionStatistics,
      runIncludedSourceConcepts = callingArgs$runIncludedSourceConcepts,
      runOrphanConcepts = callingArgs$runOrphanConcepts,
      runTimeSeries = callingArgs$runTimeSeries,
      runVisitContext = callingArgs$runVisitContext,
      runBreakdownIndexEvents = callingArgs$runBreakdownIndexEvents,
      runIncidenceRate = callingArgs$runIncidenceRate,
      runTemporalCohortCharacterization = callingArgs$runTemporalCohortCharacterization,
      minCellCount = callingArgs$minCellCount,
      minCharacterizationMean = callingArgs$minCharacterizationMean,
      incremental = callingArgs$incremental
    ) %>%
    RJSONIO::toJSON(digits = 23, pretty = TRUE)
  return(callingArgsJson)
}

# Running dummy with defaults
res1 <- executeDiagnostics()

# Running with flipped defaults
res2 <- executeDiagnostics(runInclusionStatistics = !TRUE,
                   runIncludedSourceConcepts = !TRUE,
                   runOrphanConcepts = !TRUE,
                   runTimeSeries = !FALSE,
                   runVisitContext = !TRUE,
                   runBreakdownIndexEvents = !TRUE,
                   runIncidenceRate = !TRUE,
                   runCohortRelationship = !TRUE,
                   runTemporalCohortCharacterization = !TRUE,
                   minCellCount = -5,
                   minCharacterizationMean = -0.01,
                   incremental = !FALSE)

res1
#> [1] "{\n\t\"runInclusionStatistics\" : true,\n\t\"runIncludedSourceConcepts\" : true,\n\t\"runOrphanConcepts\" : true,\n\t\"runTimeSeries\" : false,\n\t\"runVisitContext\" : true,\n\t\"runBreakdownIndexEvents\" : true,\n\t\"runIncidenceRate\" : true,\n\t\"runTemporalCohortCharacterization\" : true,\n\t\"minCellCount\" : 5,\n\t\"minCharacterizationMean\" : 0.010000000000000000208167,\n\t\"incremental\" : false\n}"

res2
#> [1] "{\n\t\"runInclusionStatistics\" : true,\n\t\"runIncludedSourceConcepts\" : true,\n\t\"runOrphanConcepts\" : true,\n\t\"runTimeSeries\" : false,\n\t\"runVisitContext\" : true,\n\t\"runBreakdownIndexEvents\" : true,\n\t\"runIncidenceRate\" : true,\n\t\"runTemporalCohortCharacterization\" : true,\n\t\"minCellCount\" : 5,\n\t\"minCharacterizationMean\" : 0.010000000000000000208167,\n\t\"incremental\" : false\n}"

# Check if results are identical
identical(res1, res2)
#> [1] TRUE

Created on 2023-08-18 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16 ucrt) #> os Windows 11 x64 (build 22621) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Dutch_Netherlands.utf8 #> ctype Dutch_Netherlands.utf8 #> tz Europe/Amsterdam #> date 2023-08-18 #> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) #> dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.1) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.1) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.1) #> knitr 1.43 2023-05-25 [1] RSPM (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.1) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.1) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1) #> RJSONIO 1.3-1.8 2023-01-31 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) #> rmarkdown 2.23 2023-07-01 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) #> styler 1.10.1 2023-06-05 [1] CRAN (R 4.3.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.1) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.1) #> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.1) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.1) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> #> [1] C:/R/R-4.3.1/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Original post in the DARWIN fork

azimov commented 1 year ago

@mvankessel-EMC thanks for this - I'm not quite sure what we use the json for?

@gowthamrao is it used in the meta data to check what the calling arguments set by the user were? If so calling as.list(environment()) %>% RJSONIO::toJSON(digits = 23, pretty = TRUE) would achieve this more elegantly.

mvankessel-EMC commented 1 year ago

@azimov I took the liberty to trace down the path callingArgsJson is used for. From what I can gather it follows this path:

  1. executeDiagnostics() RunDiagnostics.R a. L217-L232 b. L962-L1014, Specifically: L973 c. L1015-L1020 d. L1021-L1025
  2. makeDataExportable() Private.R L121-L250
  3. enforceMinCellValueDataframe() Private.R L252-L273
  4. enforceMinCellValue() Private.R L57-L80
azimov commented 1 year ago

Yes - i looks like this is just stored in the metadata result - we just need to create a list that stores the relevant arguments (and doesn't leak user data, e.g. connectionDetails)