darwin-eu-dev / CohortCharacteristics

https://darwin-eu-dev.github.io/CohortCharacteristics/
Apache License 2.0
1 stars 0 forks source link

consider dropping Q05 and Q95 from defaults, or include with median in table #25

Closed edward-burn closed 3 months ago

edward-burn commented 3 months ago

At the moment the default path leads to a row with "[Q05 - Q95]". Maybe just a personal preference but I think we could either omit this from the default, or include with the median "Median [Q05, Q25 - Q75, Q95]". What do you think @catalamarti @nmercadeb?

library(CDMConnector)
library(CodelistGenerator)
library(CohortCharacteristics)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

con <- DBI::dbConnect(duckdb::duckdb(),
                      dbdir = CDMConnector::eunomia_dir())
cdm <- CDMConnector::cdm_from_con(con,
                                  cdm_schem = "main",
                                  write_schema = "main")

cdm <- generateConceptCohortSet(
  cdm = cdm,
  name = "ankle_sprain",
  conceptSet = list("ankle_sprain" = 81151),
  end = "event_end_date",
  limit = "all",
  overwrite = TRUE
)

cdm$ankle_sprain
#> # Source:   table<main.ankle_sprain> [?? x 4]
#> # Database: DuckDB v0.9.2 [eburn@Windows 10 x64:R 4.2.1/C:\Users\eburn\AppData\Local\Temp\RtmpaCkYa3\file88806dfc5a80.duckdb]
#>    cohort_definition_id subject_id cohort_start_date cohort_end_date
#>                   <int>      <int> <date>            <date>         
#>  1                    1        712 2018-04-20        2018-05-04     
#>  2                    1       1057 1991-10-25        1991-11-15     
#>  3                    1       1950 1988-08-29        1988-09-19     
#>  4                    1       2654 1945-06-26        1945-07-31     
#>  5                    1       3001 1970-07-13        1970-08-10     
#>  6                    1       3528 1993-08-31        1993-09-28     
#>  7                    1       4459 1970-11-13        1970-11-27     
#>  8                    1       4523 1994-07-24        1994-08-21     
#>  9                    1       4631 2013-05-01        2013-05-22     
#> 10                    1       4724 1952-12-05        1953-01-09     
#> # ℹ more rows

ankle_sprain_characteristics <- summariseCharacteristics(cdm$ankle_sprain)
#> ℹ adding demographics columns
#> ℹ summarising data
#> ℹ The following estimates will be computed:
#> • variable_00001: count, percentage
#> • cohort_start_date: min, q05, q25, median, q75, q95, max
#> • cohort_end_date: min, q05, q25, median, q75, q95, max
#> • variable_00003: min, q05, q25, median, q75, q95, max, mean, sd
#> • variable_00004: min, q05, q25, median, q75, q95, max, mean, sd
#> • variable_00002: min, q05, q25, median, q75, q95, max, mean, sd
#> → Start summary of data, at 2024-04-12 09:43:38
#> 
#> ✔ Summary finished, at 2024-04-12 09:43:38
#> ✔ summariseCharacteristics finished!
tableCharacteristics(ankle_sprain_characteristics, type = "tibble")
#> # A tibble: 22 × 5
#>    `CDM name`                   `Variable name` `Variable level` `Estimate name`
#>    <chr>                        <chr>           <chr>            <chr>          
#>  1 Synthea synthetic health da… Number records  <NA>             N              
#>  2 Synthea synthetic health da… Number subjects <NA>             N              
#>  3 Synthea synthetic health da… Cohort start d… <NA>             Median [Q25 - …
#>  4 Synthea synthetic health da… Cohort start d… <NA>             [Q05 - Q95]    
#>  5 Synthea synthetic health da… Cohort start d… <NA>             Range          
#>  6 Synthea synthetic health da… Cohort end date <NA>             Median [Q25 - …
#>  7 Synthea synthetic health da… Cohort end date <NA>             [Q05 - Q95]    
#>  8 Synthea synthetic health da… Cohort end date <NA>             Range          
#>  9 Synthea synthetic health da… Age             <NA>             Median [Q25 - …
#> 10 Synthea synthetic health da… Age             <NA>             [Q05 - Q95]    
#> # ℹ 12 more rows
#> # ℹ 1 more variable: `[header]Cohort name\n[header_level]Ankle sprain` <chr>

Created on 2024-04-12 with reprex v2.0.2

nmercadeb commented 3 months ago

I would prefer to remove "[Q05 - Q95]" from the default rather than adding it to the median