At the moment the default path leads to a row with "[Q05 - Q95]". Maybe just a personal preference but I think we could either omit this from the default, or include with the median "Median [Q05, Q25 - Q75, Q95]". What do you think @catalamarti @nmercadeb?
library(CDMConnector)
library(CodelistGenerator)
library(CohortCharacteristics)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.2.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
con <- DBI::dbConnect(duckdb::duckdb(),
dbdir = CDMConnector::eunomia_dir())
cdm <- CDMConnector::cdm_from_con(con,
cdm_schem = "main",
write_schema = "main")
cdm <- generateConceptCohortSet(
cdm = cdm,
name = "ankle_sprain",
conceptSet = list("ankle_sprain" = 81151),
end = "event_end_date",
limit = "all",
overwrite = TRUE
)
cdm$ankle_sprain
#> # Source: table<main.ankle_sprain> [?? x 4]
#> # Database: DuckDB v0.9.2 [eburn@Windows 10 x64:R 4.2.1/C:\Users\eburn\AppData\Local\Temp\RtmpaCkYa3\file88806dfc5a80.duckdb]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
#> 1 1 712 2018-04-20 2018-05-04
#> 2 1 1057 1991-10-25 1991-11-15
#> 3 1 1950 1988-08-29 1988-09-19
#> 4 1 2654 1945-06-26 1945-07-31
#> 5 1 3001 1970-07-13 1970-08-10
#> 6 1 3528 1993-08-31 1993-09-28
#> 7 1 4459 1970-11-13 1970-11-27
#> 8 1 4523 1994-07-24 1994-08-21
#> 9 1 4631 2013-05-01 2013-05-22
#> 10 1 4724 1952-12-05 1953-01-09
#> # ℹ more rows
ankle_sprain_characteristics <- summariseCharacteristics(cdm$ankle_sprain)
#> ℹ adding demographics columns
#> ℹ summarising data
#> ℹ The following estimates will be computed:
#> • variable_00001: count, percentage
#> • cohort_start_date: min, q05, q25, median, q75, q95, max
#> • cohort_end_date: min, q05, q25, median, q75, q95, max
#> • variable_00003: min, q05, q25, median, q75, q95, max, mean, sd
#> • variable_00004: min, q05, q25, median, q75, q95, max, mean, sd
#> • variable_00002: min, q05, q25, median, q75, q95, max, mean, sd
#> → Start summary of data, at 2024-04-12 09:43:38
#>
#> ✔ Summary finished, at 2024-04-12 09:43:38
#> ✔ summariseCharacteristics finished!
tableCharacteristics(ankle_sprain_characteristics, type = "tibble")
#> # A tibble: 22 × 5
#> `CDM name` `Variable name` `Variable level` `Estimate name`
#> <chr> <chr> <chr> <chr>
#> 1 Synthea synthetic health da… Number records <NA> N
#> 2 Synthea synthetic health da… Number subjects <NA> N
#> 3 Synthea synthetic health da… Cohort start d… <NA> Median [Q25 - …
#> 4 Synthea synthetic health da… Cohort start d… <NA> [Q05 - Q95]
#> 5 Synthea synthetic health da… Cohort start d… <NA> Range
#> 6 Synthea synthetic health da… Cohort end date <NA> Median [Q25 - …
#> 7 Synthea synthetic health da… Cohort end date <NA> [Q05 - Q95]
#> 8 Synthea synthetic health da… Cohort end date <NA> Range
#> 9 Synthea synthetic health da… Age <NA> Median [Q25 - …
#> 10 Synthea synthetic health da… Age <NA> [Q05 - Q95]
#> # ℹ 12 more rows
#> # ℹ 1 more variable: `[header]Cohort name\n[header_level]Ankle sprain` <chr>
At the moment the default path leads to a row with "[Q05 - Q95]". Maybe just a personal preference but I think we could either omit this from the default, or include with the median "Median [Q05, Q25 - Q75, Q95]". What do you think @catalamarti @nmercadeb?
Created on 2024-04-12 with reprex v2.0.2