OHDSI / CohortGenerator

An R package for instantiating cohorts using data in the CDM.
https://ohdsi.github.io/CohortGenerator/
11 stars 10 forks source link

Cohort subset failure when retrieving subset definitions #83

Closed anthonysena closed 1 year ago

anthonysena commented 1 year ago

Using the new subset feature from #73, I'm running into the following problem when retrieving the subset definitions. Here is a reproducible example:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
sampleCohorts <- CohortGenerator::createEmptyCohortDefinitionSet()
cohortJsonFiles <- list.files(path = system.file("testdata/name/cohorts", package = "CohortGenerator"), full.names = TRUE)
cohortJsonFileName <- cohortJsonFiles[1]
cohortName <- tools::file_path_sans_ext(basename(cohortJsonFileName))
cohortJson <- readChar(cohortJsonFileName, file.info(cohortJsonFileName)$size)
sampleCohorts <- rbind(sampleCohorts, data.frame(
  cohortId = as.double(1),
  cohortName = cohortName,
  json = cohortJson,
  sql = "",
  stringsAsFactors = FALSE
))

# Limit to male only
subsetDef2 <- CohortGenerator::createCohortSubsetDefinition(
  name ="Male Only",
  definitionId = 2,
  subsetOperators = list(
    CohortGenerator::createDemographicSubset(id = 4,
                                             name = "Male",
                                             gender = 8507)
  )
)

sampleCohortsWithSubsets <- sampleCohorts %>%
  CohortGenerator::addCohortSubsetDefinition(subsetDef2)

CohortGenerator::getSubsetDefinitions(sampleCohortsWithSubsets)
#> Error in subsetDef$clone(deep = TRUE): attempt to apply non-function

sampleCohortsWithSubsets
#>   cohortId                   cohortName
#> 1        1                    celecoxib
#> 2     1002 celecoxib - Male Only (Male)
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             json
#> 1 {\n  "ConceptSets": [\n    {\n      "id": 0,\n      "name": "Celecoxib",\n      "expression": {\n        "items": [\n          {\n            "concept": {\n              "CONCEPT_CLASS_ID": "Ingredient",\n              "CONCEPT_CODE": "140587",\n              "CONCEPT_ID": 1118084,\n              "CONCEPT_NAME": "celecoxib",\n              "DOMAIN_ID": "Drug",\n              "INVALID_REASON": "V",\n              "INVALID_REASON_CAPTION": "Valid",\n              "STANDARD_CONCEPT": "S",\n              "STANDARD_CONCEPT_CAPTION": "Standard",\n              "VOCABULARY_ID": "RxNorm"\n            }\n          }\n        ]\n      }\n    }\n  ],\n  "PrimaryCriteria": {\n    "CriteriaList": [\n      {\n        "DrugEra": {\n          "CodesetId": 0\n        }\n      }\n    ],\n    "ObservationWindow": {\n      "PriorDays": 0,\n      "PostDays": 0\n    },\n    "PrimaryCriteriaLimit": {\n      "Type": "First"\n    }\n  },\n  "QualifiedLimit": {\n    "Type": "First"\n  },\n  "ExpressionLimit": {\n    "Type": "First"\n  },\n  "InclusionRules": [],\n  "CensoringCriteria": [],\n  "CollapseSettings": {\n    "CollapseType": "ERA",\n    "EraPad": 0\n  },\n  "CensorWindow": {},\n  "cdmVersionRange": ">=5.0.0"\n}
#> 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             {\n  "cohortId": [1002],\n  "targetCohortId": [1],\n  "subsetDefinitionId": [2]\n}
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      sql
#> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
#> 2 DELETE FROM @cohort_database_schema.@cohort_table WHERE cohort_definition_id = 1002;\nDROP TABLE IF EXISTS #cohort_sub_base;\nSELECT * INTO #cohort_sub_base FROM @cohort_database_schema.@cohort_table\nWHERE cohort_definition_id = 1;\nDROP TABLE IF EXISTS #S_4;\n SELECT\n  T.subject_id,\n  T.cohort_start_date,\n  T.cohort_end_date\nINTO #S_4\nFROM #cohort_sub_base T\nJOIN @cdm_database_schema.person p ON T.subject_id = p.person_id\nWHERE 1 = 1-- simplifies ternary logic\nAND p.gender_concept_id IN (8507)\n\n\nAND YEAR(T.cohort_start_date) - p.year_of_birth >= 0\nAND YEAR(T.cohort_start_date) - p.year_of_birth <= 9999;\nINSERT INTO @cohort_database_schema.@cohort_table\nSELECT\n    1002 as cohort_definition_id,\n    T.subject_id,\n    T.cohort_start_date,\n    T.cohort_end_date\nFROM #S_4 T;\n\nDROP TABLE IF EXISTS #cohort_sub_base;\nDROP TABLE IF EXISTS #S_4;
#>   subsetParent isSubset subsetDefinitionId
#> 1            1    FALSE                 NA
#> 2            1     TRUE                  2

attributes(sampleCohortsWithSubsets)
#> $names
#> [1] "cohortId"           "cohortName"         "json"              
#> [4] "sql"                "subsetParent"       "isSubset"          
#> [7] "subsetDefinitionId"
#> 
#> $row.names
#> [1] 1 2
#> 
#> $cohortSubsetDefinitions
#> $cohortSubsetDefinitions[[1]]
#> NULL
#> 
#> $cohortSubsetDefinitions[[2]]
#> <CohortSubsetDefinition>
#>   Public:
#>     addSubsetOperator: function (subsetOperator) 
#>     clone: function (deep = FALSE) 
#>     definitionId: active binding
#>     getJsonFileName: function (subsetJsonFolder = "inst/cohort_subset_definitions/") 
#>     getSubsetCohortName: function (cohortDefinitionSet, targetOutputPair) 
#>     getSubsetOperatorById: function (id) 
#>     getSubsetQuery: function (targetOutputPair) 
#>     identifierExpression: active binding
#>     initialize: function (definition = NULL) 
#>     name: active binding
#>     setTargetOutputPairs: function (targetIds) 
#>     subsetIds: active binding
#>     subsetOperators: active binding
#>     targetOutputPairs: active binding
#>     toJSON: function () 
#>     toList: function () 
#>   Private:
#>     .definitionId: 2
#>     .identifierExpression: expression
#>     .name: Male Only
#>     .subsetIds: 4
#>     .subsetOperators: list
#>     .targetOutputPairs: list
#>     createSubset: function (item, itemClass = item$subsetType) 
#> 
#> 
#> $class
#> [1] "data.frame"
#> 
#> $hasSubsetDefinitions
#> [1] TRUE

Created on 2023-02-28 with reprex v2.0.2

This is happening in this code block:

https://github.com/OHDSI/CohortGenerator/blob/5437d9877d8864ac6c24138cd5f96af0f14f6195/R/SubsetDefinitions.R#L438-L441

where subsetDef is NULL thus the clone function does not exist.

It appears that the list created by: https://github.com/OHDSI/CohortGenerator/blob/develop/R/SubsetDefinitions.R#L321 has a lot of extra entries that are NULL when the expectation is that the list() enumerates the subset definitions in use. Just noting this issue before I dive into the investigation.

anthonysena commented 1 year ago

I believe this is the problem:

https://github.com/OHDSI/CohortGenerator/blob/5437d9877d8864ac6c24138cd5f96af0f14f6195/R/SubsetDefinitions.R#L343

Based on this definition

subsetDef2 <- CohortGenerator::createCohortSubsetDefinition(
  name ="Male Only",
  definitionId = 2,
  subsetOperators = list(
    CohortGenerator::createDemographicSubset(id = 4,
                                             name = "Male",
                                             gender = 8507)
  )
)

My code defines definitionId == 2 above and when adding the subset to the cohortDefinitionSet, it adds the subset definition to position 2 in the list thus creating an empty 1st element causing the problem.