exposureOutcomeSetId not found when using multiple outcomeIds in exposuresOutcomeList

mvankessel-EMC commented 1 month ago

First of all I'm not entirely sure if this use case is valid, so if not please let me know how to do this appropriately.

Basically I have several outcomes and one exposure. And I would like to run SCCS per outcome, with one exposure. If I do this one by one, it works great, and get expected results. However, if I do this using the MultipleAnalyses vignette, I run into issues. exportToCsv() throws the following error:

#> Error in `filter()`:
#> ℹ In argument: `&...`.
#> Caused by error in `.data$exposuresOutcomeSetId`:
#> ! Column `exposuresOutcomeSetId` not found in `.data`.

First I create my cohort table. And update half of the cohort_definition_id's form 77 to 76, to simulate the multiple outcomes.

library(SelfControlledCaseSeries)
#> Loading required package: Cyclops
#> Loading required package: DatabaseConnector
#> Loading required package: Andromeda
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(Eunomia)

connectionDetails <- getEunomiaConnectionDetails()

outputFolder <- file.path(tempdir(), "eunomia-results")
exportFolder <- file.path(tempdir(), "eunomia-export")
unlink(outputFolder, recursive = TRUE, force = TRUE)

cohortDatabaseSchema <- "main"
cdmDatabaseSchema <- "main"

giBleed <- 77
diclofenac <- 1124300
cohortDefinitionSet <- PhenotypeLibrary::getPlCohortDefinitionSet(giBleed)

connection <- DatabaseConnector::connect(connectionDetails)
#> Connecting using SQLite driver

cohortTableNames <- CohortGenerator::getCohortTableNames(cohortTable = "cohort_table")
CohortGenerator::createCohortTables(
  connection = connection,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTableNames = cohortTableNames
)
#> Creating cohort tables
#> - Created table main.cohort_table
#> - Created table main.cohort_table
#> - Created table main.cohort_table_inclusion
#> - Created table main.cohort_table_inclusion_result
#> - Created table main.cohort_table_inclusion_stats
#> - Created table main.cohort_table_summary_stats
#> - Created table main.cohort_table_censor_stats
#> Creating cohort tables took 0.13secs

counts <- CohortGenerator::generateCohortSet(
  connection = connection,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortDatabaseSchema = cohortDatabaseSchema,
  cohortTableNames = cohortTableNames,
  cohortDefinitionSet = cohortDefinitionSet
)
#> Initiating cluster consisting only of main thread
#> 1/1- Generating cohort: Gastrointestinal bleeding with inpatient admission (id = 77)
#>   |                                                                              |                                                                      |   0%  |                                                                              |==                                                                    |   3%  |                                                                              |=====                                                                 |   6%  |                                                                              |=======                                                               |  10%  |                                                                              |=========                                                             |  13%  |                                                                              |===========                                                           |  16%  |                                                                              |==============                                                        |  19%  |                                                                              |================                                                      |  23%  |                                                                              |==================                                                    |  26%  |                                                                              |====================                                                  |  29%  |                                                                              |=======================                                               |  32%  |                                                                              |=========================                                             |  35%  |                                                                              |===========================                                           |  39%  |                                                                              |=============================                                         |  42%  |                                                                              |================================                                      |  45%  |                                                                              |==================================                                    |  48%  |                                                                              |====================================                                  |  52%  |                                                                              |======================================                                |  55%  |                                                                              |=========================================                             |  58%  |                                                                              |===========================================                           |  61%  |                                                                              |=============================================                         |  65%  |                                                                              |===============================================                       |  68%  |                                                                              |==================================================                    |  71%  |                                                                              |====================================================                  |  74%  |                                                                              |======================================================                |  77%  |                                                                              |========================================================              |  81%  |                                                                              |===========================================================           |  84%  |                                                                              |=============================================================         |  87%  |                                                                              |===============================================================       |  90%  |                                                                              |=================================================================     |  94%  |                                                                              |====================================================================  |  97%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.0759 secs
#> Generating cohort set took 0.41 secs

cohortTable <- DatabaseConnector::renderTranslateQuerySql(
  connection = connection,
  sql = "SELECT * FROM cohort_table;"
)

# Set first half of cohort_definition_id to 76
cohortTable$COHORT_DEFINITION_ID[0:floor(nrow(cohortTable) / 2)] <- 76

# Overwrite cohort_table
DatabaseConnector::insertTable(
  connection = connection,
  databaseSchema = "main",
  tableName = "cohort_table",
  data = cohortTable,
  dropTableIfExists = TRUE
)
#> Inserting data took 0.0159 secs

DatabaseConnector::disconnect(connection)

I then setup the SCCS arguments.

diclofenac <- 1124300
outcomeIds <- c(
  # GiBleed => representing outcome A
  76,
  # GiBleed => representing outcome B
  77
)

# List of 2 outcomes, both with exposure diclofenac
exposuresOutcomeList <- lapply(
  X = outcomeIds, # c(76, 77)
  FUN = createExposuresOutcome,
  exposures = list(createExposure(exposureId = diclofenac))
)

getDbSccsDataArgs <- createGetDbSccsDataArgs(
  useCustomCovariates = FALSE,
  deleteCovariatesSmallCount = 100,
  exposureIds = c(),
  maxCasesPerOutcome = 100000
)

createStudyPopulationArgs <- createCreateStudyPopulationArgs(
  naivePeriod = 180,
  firstOutcomeOnly = FALSE
)

covarExposureOfInt <- createEraCovariateSettings(
  label = "Exposure of interest",
  start = 1,
  end = 0,
  endAnchor = "era end",
  profileLikelihood = TRUE
)

createSccsIntervalDataArgs1 <- createCreateSccsIntervalDataArgs(
  eraCovariateSettings = covarExposureOfInt
)

fitSccsModelArgs <- createFitSccsModelArgs()

analyses <- list(createSccsAnalysis(
  analysisId = 1,
  description = "Two outcomes, one exposure",
  getDbSccsDataArgs = getDbSccsDataArgs,
  createStudyPopulationArgs = createStudyPopulationArgs,
  createIntervalDataArgs = createSccsIntervalDataArgs1,
  fitSccsModelArgs = fitSccsModelArgs
))

referenceTable <- runSccsAnalyses(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  exposureDatabaseSchema = "main",
  exposureTable = "drug_era",
  outcomeDatabaseSchema = "main",
  outcomeTable = "cohort_table",
  cdmVersion = "5",
  outputFolder = outputFolder,
  combineDataFetchAcrossOutcomes = TRUE,
  exposuresOutcomeList = exposuresOutcomeList,
  sccsAnalysisList = analyses
)
#> *** Creating sccsData objects ***
#> Initiating cluster consisting only of main thread
#> Connecting using SQLite driver
#> Inserting data took 0.0275 secs
#> Selecting outcomes
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.0036 secs
#> Creating cases
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.00487 secs
#> Counting outcomes
#> 
#> Creating eras
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======                                                               |  10%  |                                                                              |==============                                                        |  20%  |                                                                              |=====================                                                 |  30%  |                                                                              |============================                                          |  40%  |                                                                              |===================================                                   |  50%  |                                                                              |==========================================                            |  60%  |                                                                              |=================================================                     |  70%  |                                                                              |========================================================              |  80%  |                                                                              |===============================================================       |  90%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.0335 secs
#> Fetching data from server
#> Fetched 391 cases from server
#> Getting SCCS data from server took 0.565 secs
#> Disconnected Andromeda. This data object can no longer be used*** Creating studyPopulation objects ***
#> Initiating cluster consisting only of main thread
#> *** Creating sccsIntervalData objects ***
#> Initiating cluster consisting only of main thread
#> Converting person data to SCCS intervals. This might take a while.
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Generating SCCS interval data took 0.257 secs
#> Disconnected Andromeda. This data object can no longer be usedConverting person data to SCCS intervals. This might take a while.
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
#> Generating SCCS interval data took 0.259 secs
#> Disconnected Andromeda. This data object can no longer be used*** Fitting models ***
#> Initiating cluster consisting only of main thread
#> Fitting SCCS model
#> Using prior: None
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Fitting the model took 0.0993 secs
#> Model fitting status is: OK
#> Fitting SCCS model
#> Using prior: None
#> 
#> Warning: BLR gradient is ill-conditioned
#> Enforcing convergence!
#> Warning in Cyclops::fitCyclopsModel(cyclopsData, prior = prior, control =
#> control): BLR convergence criterion failed; coefficient may be infinite
#> Using prior: None
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Using 1 thread(s)
#> Fitting the model took 0.0962 secs
#> Model fitting status is: OK
#> *** Summarizing results ***
#>   |                                                                              |                                                                      |   0%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================================================| 100%

Running the analysis itself works fine. But exporting throws the error.

SelfControlledCaseSeries::exportToCsv(
  outputFolder = outputFolder,
  exportFolder = exportFolder,
  databaseId = "Eunomia",
  minCellCount = 5
)
#> Exporting results to CSV
#> - sccs_analysis table
#> - sccs_covariate_analysis table
#> - sccs_exposure and sccs_exposures_outcome_set tables
#> - sccs_age_spanning, sccs_attrition, sccs_calender_time_spanning, sccs_censor_model, sccs_covariate, sccs_covariate_result, sccs_diagnostics_summary, sccs_era, sccs_event_dep_observation, sccs_likelihood_profile, sccs_spline, sccs_time_to_event, and sccs_time_trend tables
#> Initiating cluster consisting only of main thread
#> Error in `filter()`:
#> ℹ In argument: `&...`.
#> Caused by error in `.data$exposuresOutcomeSetId`:
#> ! Column `exposuresOutcomeSetId` not found in `.data`.

^{Created on 2024-07-24 with reprex v2.1.1}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.0 (2024-04-24 ucrt) #> os Windows 11 x64 (build 22631) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Dutch_Netherlands.utf8 #> ctype Dutch_Netherlands.utf8 #> tz Europe/Amsterdam #> date 2024-07-24 #> pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> ! package * version date (UTC) lib source #> Andromeda * 0.6.6 2024-03-21 [1] CRAN (R 4.4.0) #> backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0) #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.4.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.4.0) #> blob 1.2.4 2023-03-17 [1] CRAN (R 4.4.0) #> cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0) #> checkmate 2.3.1 2023-12-04 [1] CRAN (R 4.4.0) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0) #> CohortGenerator 0.9.0 2024-06-01 [1] Github (ohdsi/CohortGenerator@e3efad6) #> crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.1) #> Cyclops * 3.4.1 2024-06-06 [1] CRAN (R 4.4.0) #> DatabaseConnector * 6.3.2 2023-12-11 [1] CRAN (R 4.4.0) #> DBI 1.2.3 2024-06-02 [1] CRAN (R 4.4.0) #> dbplyr 2.5.0 2024-03-19 [1] CRAN (R 4.4.0) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0) #> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0) #> Eunomia * 2.0.0 2024-05-30 [1] Github (OHDSI/Eunomia@f016f27) #> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0) #> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0) #> hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0) #> knitr 1.48 2024-07-07 [1] CRAN (R 4.4.1) #> lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.4.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0) #> Matrix 1.7-0 2024-03-22 [1] CRAN (R 4.4.0) #> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.1) #> ParallelLogger 3.3.0 2023-08-22 [1] CRAN (R 4.4.0) #> PhenotypeLibrary 3.32.0 2024-05-28 [1] Github (OHDSI/PhenotypeLibrary@bccdd87) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0) #> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0) #> readr 2.1.5 2024-01-10 [1] CRAN (R 4.4.0) #> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.1) #> D rJava 1.0-11 2024-01-26 [1] CRAN (R 4.4.0) #> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0) #> rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0) #> RSQLite 2.3.7 2024-05-27 [1] CRAN (R 4.4.0) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0) #> SelfControlledCaseSeries * 5.2.2 2024-07-24 [1] Github (OHDSI/SelfControlledCaseSeries@67b26f6) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0) #> SqlRender 1.18.0 2024-05-30 [1] CRAN (R 4.4.0) #> survival 3.7-0 2024-06-05 [1] CRAN (R 4.4.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0) #> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0) #> timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0) #> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0) #> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.4.0) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0) #> xfun 0.45 2024-06-16 [1] CRAN (R 4.4.0) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.4.0) #> zip 2.3.1 2024-01-27 [1] CRAN (R 4.4.0) #> #> [1] C:/R/R-4.4.0/library #> #> D ── DLL MD5 mismatch, broken installation. #> #> ────────────────────────────────────────────────────────────────────────────── ```

mvankessel-EMC commented 1 month ago

Setting exposureOfInterest = TRUE and includeEraIds = "exposureId" made me able to export it. So it passes this if statement, and runs this code. Otherwise the resultsSummary.rds file is empty and throws the error in my original post.

covarExposureOfInt <- createEraCovariateSettings(
  label = "Exposure of interest",
  start = 1,
  end = 0,
  endAnchor = "era end",
  profileLikelihood = TRUE,
  exposureOfInterest = TRUE,
  includeEraIds = "exposureId"
)

I'm not sure when you would ever set exposureOfInterest = FALSE

schuemie commented 1 month ago

Thanks! I'll take a look. I usually explicitly set includeEraIds = "exposureId". The default for includeEraIds is NULL, which means all exposures in the data. In this case you only have one, so that should actually work.

exposureOfInterest = FALSE would be used for covariates in your model that you don't need to see in the summary results table (and don't require empirical calibration). For example, often you want a pre-exposure period to deal with contra-indications and healthy-vaccinee effects, but you probably aren't interested in seeing a calibrated estimate for that period in the summary table.

schuemie commented 1 month ago

I dove in to this, and the issue is that if no covariate is defined to be the exposure of interest, then the analysis summary table is empty (i.e. there are no results of interest, as the user defined). This causes all sorts of issues in the export() funciton. Since I don't see a good use case for allowing this, I've modified the createSccsAnalysis() function to throw an error when no covariate is defined to be the exposure of interest.

I also removed the default value of the includeEraIds argument of createEraCovariateSettings() to force the user to explicitly choose an exposure (or explicitly choose all exposures).

I've added a comment to the vignette that I'm sure will avoid all confusion on this topic on the future ;-)

OHDSI / SelfControlledCaseSeries

exposureOutcomeSetId not found when using multiple outcomeIds in exposuresOutcomeList #60