Open javier-gracia-tabuenca-tuni opened 3 years ago
Hi @javier-gracia-tabuenca-tuni - can you link me to where in CohortDiagnostics where you found this code? Upon first glance, the parameter useVisitCconceptCount
appears to be malformed (but might just be a typo here).
Sorry, this code is not in CohortDiagnostics.
I'm trying to run a custom cohort diagnosis which shows in Temporal Characterisation the count of visits in each time period.
Here is some code to reproduce the error:
library(tidyverse)
Sys.setenv(JAVA_HOME='C:\\Program Files\\AdoptOpenJDK\\jdk-8.0.232.09-hotspot\\')
library(SqlRender)
library(Eunomia)
library(CohortDiagnostics)
working_folder <- getwd()
connectionDetails <- getEunomiaConnectionDetails()
connection <- connect(connectionDetails)
cdmDatabaseSchema <- "main"
oracleTempSchema <- NULL
cohortDatabaseSchema <- "main"
cohortTable <- "tmp_cohort_table"
cohortSetReference <- tibble(
atlasId = 1776012 ,
atlasName = "Asthma",
cohortId = 1776012 ,
name = "Asthma"
)
createCohortTable(connectionDetails = connectionDetails,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable)
baseUrl <- "http://api.ohdsi.org:80/WebAPI"
inclusionStatisticsFolder <- file.path(working_folder, "incStats")
instantiateCohortSet(connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
oracleTempSchema = oracleTempSchema,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable,
baseUrl = baseUrl,
cohortSetReference = cohortSetReference,
generateInclusionStats = TRUE,
inclusionStatisticsFolder = inclusionStatisticsFolder)
my_tempCovaraitesSettings <- createTemporalCovariateSettings(
useVisitConceptCount = TRUE,
temporalStartDays = c(-365, 0 , 30*2+1, 365*1+1),
temporalEndDays = c( -1, 30*2, 365*1, 365*2+1)
)
databaseId <- "testDB"
exportFolder <- file.path(working_folder, "export")
runCohortDiagnostics(baseUrl = baseUrl,
cohortSetReference = cohortSetReference,
#
connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
oracleTempSchema = oracleTempSchema,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable,
#
databaseId = databaseId,
databaseName = databaseId,
databaseDescription = databaseId,
#
inclusionStatisticsFolder = inclusionStatisticsFolder,
exportFolder = exportFolder,
#
runInclusionStatistics = TRUE,
runIncludedSourceConcepts = TRUE,
runOrphanConcepts = TRUE,
runTimeDistributions = TRUE,
runBreakdownIndexEvents = TRUE,
runIncidenceRate = TRUE,
runCohortOverlap = TRUE,
runCohortCharacterization = TRUE,
minCellCount = 5,
#
runTemporalCohortCharacterization = TRUE,
temporalCovariateSettings = my_tempCovaraitesSettings
)
read_csv("./export/temporal_time_ref.csv")
read_csv("./export/temporal_covariate_value.csv")
preMergeDiagnosticsFiles(exportFolder)
launchDiagnosticsExplorer(exportFolder)
However, I traced the error and it seems to originate in FeatureExtraction. For this reason I posted it here.
Here is a simpler code to reproduce the error in FeatureExtraction:
library(tidyverse)
Sys.setenv(JAVA_HOME='C:\\Program Files\\AdoptOpenJDK\\jdk-8.0.232.09-hotspot\\')
library(SqlRender)
library(Eunomia)
library(FeatureExtraction)
working_folder <- getwd()
connectionDetails <- getEunomiaConnectionDetails()
connection <- connect(connectionDetails)
cdmDatabaseSchema <- "main"
oracleTempSchema <- NULL
cohortDatabaseSchema <- "main"
cohortTable <- "tmp_cohort_table"
cohortSetReference <- tibble(
atlasId = 1776012 ,
atlasName = "Asthma",
cohortId = 1776012 ,
name = "Asthma"
)
createCohortTable(connectionDetails = connectionDetails,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable)
baseUrl <- "http://api.ohdsi.org:80/WebAPI"
inclusionStatisticsFolder <- "~/incStats"
instantiateCohortSet(connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
oracleTempSchema = oracleTempSchema,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable,
baseUrl = baseUrl,
cohortSetReference = cohortSetReference,
generateInclusionStats = TRUE,
inclusionStatisticsFolder = inclusionStatisticsFolder)
my_tempCovaraitesSettings <- createTemporalCovariateSettings(
useVisitConceptCount = TRUE,
temporalStartDays = c(-365, 0 , 30*2+1, 365*1+1),
temporalEndDays = c( -1, 30*2, 365*1, 365*2+1)
)
tempCov <- getDbCovariateData(
connectionDetails = connectionDetails,
oracleTempSchema = oracleTempSchema,
cdmDatabaseSchema = cdmDatabaseSchema,
cohortTable = cohortTable,
cohortDatabaseSchema = cdmDatabaseSchema,
cohortTableIsTemp = FALSE,
cohortId = 1776012,
rowIdField = "subject_id",
covariateSettings = my_tempCovaraitesSettings,
aggregated = TRUE
)
tempCov$covariatesContinuous
Should be one value for each time_id
not only one NA
Hi @javier-gracia-tabuenca-tuni - thanks for the code to help reproduce this problem! I've taken your code and made some changes changed it since there were some functions you referenced that are missing and I don't have the Asthma cohort referenced. So, I constructed a very basic cohort to use anyone in the Eunomia data set that has an observation_period entry to keep things simple. Below is the code:
library(SqlRender)
library(Eunomia)
#> Loading required package: DatabaseConnector
library(FeatureExtraction)
#> Loading required package: Andromeda
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
connectionDetails <- getEunomiaConnectionDetails()
connection <- connect(connectionDetails)
#> Connecting using SQLite driver
cdmDatabaseSchema <- "main"
oracleTempSchema <- NULL
cohortDatabaseSchema <- "main"
cohortTable <- "cohort"
# Construct a cohort of people
cohortMembers <- querySql(connection, "SELECT 1 cohort_definition_id,
person_id subject_id,
observation_period_start_date cohort_start_date,
observation_period_end_date cohort_end_date
FROM observation_period")
# Insert into the cohort table
DatabaseConnector::insertTable(connection = connection,
tableName = cohortTable,
data = cohortMembers,
dropTableIfExists = FALSE,
createTable = FALSE)
# Characterize visit counts over the following time windows
tempCovaraitesSettings <- createTemporalCovariateSettings(
useVisitConceptCount = TRUE,
temporalStartDays = c(0,31,61,91),
temporalEndDays = c(30,60,90,365*2+1)
)
tempCov <- getDbCovariateData(
connectionDetails = connectionDetails,
oracleTempSchema = oracleTempSchema,
cdmDatabaseSchema = cdmDatabaseSchema,
cohortTable = cohortTable,
cohortDatabaseSchema = cdmDatabaseSchema,
cohortTableIsTemp = FALSE,
cohortId = 1,
rowIdField = "subject_id",
covariateSettings = tempCovaraitesSettings,
aggregated = TRUE
)
#> Connecting using SQLite driver
#> Sending temp tables to server
#> Constructing features on server
#> | | | 0% | |== | 4% | |===== | 7% | |======== | 11% | |========== | 14% | |============ | 18% | |=============== | 21% | |================== | 25% | |==================== | 29% | |====================== | 32% | |========================= | 36% | |============================ | 39% | |============================== | 43% | |================================ | 46% | |=================================== | 50% | |====================================== | 54% | |======================================== | 57% | |========================================== | 61% | |============================================= | 64% | |================================================ | 68% | |================================================== | 71% | |==================================================== | 75% | |======================================================= | 79% | |========================================================== | 82% | |============================================================ | 86% | |============================================================== | 89% | |================================================================= | 93% | |==================================================================== | 96% | |======================================================================| 100%
#> Executing SQL took 0.0273 secs
#> Fetching data from server
#> Fetching data took 0.125 secs
data.frame(tempCov$timeRef)
#> timeId startDay endDay
#> 1 1 0 30
#> 2 2 31 60
#> 3 3 61 90
#> 4 4 91 731
data.frame(tempCov$covariatesContinuous)
#> cohortDefinitionId covariateId countValue minValue maxValue averageValue
#> 1 1 9201911 15 0 1 0.002807412
#> standardDeviation medianValue p10Value p25Value p75Value p90Value timeId
#> 1 0 0 0 0 0 0 NA
Created on 2021-01-12 by the reprex package (v0.3.0)
As shown, the temporal analysis attempts to use 4 different time periods but the resulting continuous covariate result does not use the timeId
as you originally reported. So, I dug a bit deeper to try and find the issue and tracked it down to the ConceptCount.sql file. What I found is that some of the SQL operations in this file do not use the time_id
column throughout the analysis for temporal analysis. Additionally, I found the following code in that script that seems to imply that we do not want to consider the time_id as part of the results:
@schuemie - tagging you here to get your input on this particular code to ask: is this a bug or a decision to not include the temporal time identifiers as part of the results when constructing these types of covariates? Or does this perspective change at all when doing this at a patient level vs aggregating the results?
Thanks Anthon. The asthma cohort was in atlas-demo.ohdsi.org, but yeap, using query to build cohort much better for reproducing. I'll do it like that next time.
I see you dug deeper than me. Let's wait what @schuemie has to say.
Column
timeId
is all NAI can't find why this happens. I came to find this error when using CohortDiagnostics and I trying to get the temporal characterization of visit counts.