Allow targetCohorts, eventCohorts, and exitCohorts to be in different tables

edward-burn commented 12 months ago

@mvankessel-EMC it would be nice if targetCohorts, eventCohorts, and exitCohorts were allowed to be in different tables in the cdm. This would provide some more flexibility in the analysis, as we may have created our cohorts in different tables to start with. This would also match the interface of other darwin packages, where we'd normally have something like targetCohortTable, targetCohortId=NULL, eventCohortTable, eventCohortId=NULL, exitCohortTable, exitCohortId =NULL to allow a lot of flexibility to the user.

mvankessel-EMC commented 12 months ago

Could you give an example what this would look like in the database? Could you setup some dummy code with CDMConnector to showcase how that would look like? It gives me something more robust to develop around.

edward-burn commented 11 months ago

@mvankessel-EMC here is an example with Eunomia. The nice thing with having separate tables is say if you have multiple target cohorts you could automatically identify these and run multiple analyses against the outcome table, etc. It is also nice because it makes it easy for the user to create drug cohorts using the DrugUtilisation package, exposure cohorts using GenerateCohortSet, etc, so gives a lot of flexibility.

library(CDMConnector)
#> Warning: package 'CDMConnector' was built under R version 4.2.3
library(DrugUtilisation)
#> 
#> Attaching package: 'DrugUtilisation'
#> The following object is masked from 'package:CDMConnector':
#> 
#>     generateConceptCohortSet
library(CohortSurvival)
library(CodelistGenerator)

con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
cdm <- cdm_from_con(con, cdm_schema = "main", write_schema = "main")
cdm
#> # OMOP CDM reference (tbl_duckdb_connection)
#> 
#> Tables: person, observation_period, visit_occurrence, visit_detail, condition_occurrence, drug_exposure, procedure_occurrence, device_exposure, measurement, observation, death, note, note_nlp, specimen, fact_relationship, location, care_site, provider, payer_plan_period, cost, drug_era, dose_era, condition_era, metadata, cdm_source, concept, vocabulary, domain, concept_class, concept_relationship, relationship, concept_synonym, concept_ancestor, source_to_concept_map, drug_strength

# target cohort
cdm <- CDMConnector::generateConceptCohortSet( 
  cdm = cdm,
  name = "sinusitis",
  limit = "first",
  conceptSet = list("sinusitis" = 40481087)
)
cohortSet(cdm$sinusitis)
#> # A tibble: 1 × 2
#>   cohort_definition_id cohort_name
#>                  <int> <chr>      
#> 1                    1 sinusitis
cohortCount(cdm$sinusitis)
#> # A tibble: 1 × 3
#>   cohort_definition_id number_records number_subjects
#>                  <int>          <dbl>           <dbl>
#> 1                    1           2686            2686

# outcome cohorts
druglist <- getDrugIngredientCodes(cdm, c("acetaminophen", "metformin"))
cdm <- generateDrugUtilisationCohortSet(
  cdm = cdm,
  name = "drug_cohorts",
  conceptSetList = druglist,
  gapEra = 30
)

# summarise characteristics
summary_characteristics <- summariseCharacteristics(
  cohort = cdm$sinusitis,
  cohortIntersect = list(
    "Medications" = list(
      targetCohortTable = "drug_cohorts", value = "flag", window = c(-365, 0)
    )))
#> → Ingredient: Acetaminophen (1125315) has been changed to ingredient__acetaminophen__1125315_
#> → some provided names were not in snake_case
#> → names have been changed to lower case
#> → special symbols in names have been changed to '_'
summary_characteristics %>% 
  glimpse()
#> Error in glimpse(.): could not find function "glimpse"

# survival
summary_survival <- estimateSingleEventSurvival(cdm = cdm, 
                            targetCohortTable = "sinusitis", 
                            outcomeCohortTable = "drug_cohorts")
#> → Ingredient: Acetaminophen (1125315) has been changed to ingredient__acetaminophen__1125315_
#> → some provided names were not in snake_case
#> → names have been changed to lower case
#> → special symbols in names have been changed to '_'
#> → Ingredient: Acetaminophen (1125315) has been changed to ingredient__acetaminophen__1125315_
#> → some provided names were not in snake_case
#> → names have been changed to lower case
#> → special symbols in names have been changed to '_'
#> Getting overall estimates
summary_survival
#> # A tibble: 331,106 × 14
#>    cdm_name result_type group_name group_level strata_name strata_level variable
#>    <chr>    <chr>       <chr>      <chr>       <chr>       <chr>        <chr>   
#>  1 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  2 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  3 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  4 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  5 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  6 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  7 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  8 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#>  9 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#> 10 Synthea… Survival e… Cohort     sinusitis   Overall     Overall      Outcome 
#> # ℹ 331,096 more rows
#> # ℹ 7 more variables: variable_level <chr>, estimate_type <chr>,
#> #   variable_type <chr>, outcome <chr>, time <dbl>, analysis_type <chr>,
#> #   estimate <dbl>

# treatement patterns
# would be nice to have the ability to do something like

# executeTreatmentPatterns(cdm=cdm, 
#                          targetCohortTable,
#                          eventCohortTable, 
#                          exitCohortTable,
#                          .....)

^{Created on 2023-10-12 with reprex v2.0.2}

darwin-eu-dev / TreatmentPatterns

Allow targetCohorts, eventCohorts, and exitCohorts to be in different tables #172