Closed conovermitch closed 8 months ago
Thanks for reporting this @conovermitch and @jamieweaver. I can confirm this is confusing and put together a small reproducible example to further illustrate the problem:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
connectionDetails <- Eunomia::getEunomiaConnectionDetails()
Eunomia::createCohorts(connectionDetails)
#> Connecting using SQLite driver
#> Creating cohort: Celecoxib
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.0152 secs
#> Creating cohort: Diclofenac
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.0182 secs
#> Creating cohort: GiBleed
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.0307 secs
#> Creating cohort: NSAIDs
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.0835 secs
#> Cohorts created in table main.cohort
#> cohortId name
#> 1 1 Celecoxib
#> 2 2 Diclofenac
#> 3 3 GiBleed
#> 4 4 NSAIDs
#> description
#> 1 A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
#> 2 A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
#> 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
#> 4 A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
#> count
#> 1 1844
#> 2 850
#> 3 479
#> 4 2694
covSettings <- FeatureExtraction::createCovariateSettings(
useDrugGroupEraLongTerm = TRUE,
useDrugGroupEraStartLongTerm = TRUE
)
covariateData <- FeatureExtraction::getDbCovariateData(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "main",
covariateSettings = covSettings,
aggregated = TRUE
)
#> Connecting using SQLite driver
#> Currently in a tryCatch or withCallingHandlers block, so unable to add global calling handlers. ParallelLogger will not capture R messages, errors, and warnings, only explicit calls to ParallelLogger. (This message will not be shown again this R session)
#> Constructing features on server
#> | | | 0% | |=== | 5% | |====== | 9% | |========== | 14% | |============= | 18% | |================ | 23% | |=================== | 27% | |====================== | 32% | |========================= | 36% | |============================= | 41% | |================================ | 45% | |=================================== | 50% | |====================================== | 55% | |========================================= | 59% | |============================================= | 64% | |================================================ | 68% | |=================================================== | 73% | |====================================================== | 77% | |========================================================= | 82% | |============================================================ | 86% | |================================================================ | 91% | |=================================================================== | 95% | |======================================================================| 100%
#> Executing SQL took 0.543 secs
#> Fetching data from server
#> Fetching data took 0.229 secs
<sup>Created on 2023-12-14 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
Note the names for analysis 410 (DrugGroupEraLongTerm) are the same as 414 (DrugGroupEraStartLongTerm)
> covariateData$covariateRef %>%
+ filter(analysisId == 410) %>%
+ select(covariateId, covariateName)
# Source: SQL [?? x 2]
# Database: sqlite 3.41.2 [C:\Users\asena5\AppData\Local\Temp\1\RtmpeexOFI\file93483a666678.sqlite]
covariateId covariateName
<dbl> <chr>
1 708298410 drug_era group during day -365 through 0 days relative to index: Midazolam
2 723013410 drug_era group during day -365 through 0 days relative to index: Diazepam
3 738818410 drug_era group during day -365 through 0 days relative to index: Doxylamine
4 740275410 drug_era group during day -365 through 0 days relative to index: Carbamazepine
5 753626410 drug_era group during day -365 through 0 days relative to index: Propofol
6 782043410 drug_era group during day -365 through 0 days relative to index: Isoflurane
7 914335410 drug_era group during day -365 through 0 days relative to index: Atropine
8 920293410 drug_era group during day -365 through 0 days relative to index: Nitrofurantoin
9 933724410 drug_era group during day -365 through 0 days relative to index: Phenazopyridine
10 967823410 drug_era group during day -365 through 0 days relative to index: Sodium Chloride
# ℹ more rows
# ℹ Use `print(n = ...)` to see more rows
> covariateData$covariateRef %>%
+ filter(analysisId == 414) %>%
+ select(covariateId, covariateName)
# Source: SQL [?? x 2]
# Database: sqlite 3.41.2 [C:\Users\asena5\AppData\Local\Temp\1\RtmpeexOFI\file93483a666678.sqlite]
covariateId covariateName
<dbl> <chr>
1 708298414 drug_era group during day -365 through 0 days relative to index: Midazolam
2 723013414 drug_era group during day -365 through 0 days relative to index: Diazepam
3 738818414 drug_era group during day -365 through 0 days relative to index: Doxylamine
4 753626414 drug_era group during day -365 through 0 days relative to index: Propofol
5 782043414 drug_era group during day -365 through 0 days relative to index: Isoflurane
6 914335414 drug_era group during day -365 through 0 days relative to index: Atropine
7 920293414 drug_era group during day -365 through 0 days relative to index: Nitrofurantoin
8 933724414 drug_era group during day -365 through 0 days relative to index: Phenazopyridine
9 967823414 drug_era group during day -365 through 0 days relative to index: Sodium Chloride
10 975125414 drug_era group during day -365 through 0 days relative to index: Hydrocortisone
# ℹ more rows
# ℹ Use `print(n = ...)` to see more rows
Hi @conovermitch, I have discussed a solution with @anthonysena
We propose to add the (unique) analysisName to the covariate name in brackets.
For 410 the covariate name will be:
drug_era group (DrugGroupEraLongTerm) during day -365 through 0 days relative to index: Midazolam
and for 414 it will be:
drug_era group (DrugGroupEraStartLongTerm) during day -365 through 0 days relative to index: Midazolam
how does that sound to you?
See screen grab below for a bug where FeatureExtraction seems to be producing two distinct covariates that have two different results but have the same value for covariateName.
analysisId=410: https://github.com/OHDSI/FeatureExtraction/blob/main/inst/csv/PrespecAnalyses.csv#L51
analysisId=414: https://github.com/OHDSI/FeatureExtraction/blob/main/inst/csv/PrespecAnalyses.csv#L55
The SQL that constructs the covariate and creates the covariateName is here: https://github.com/OHDSI/FeatureExtraction/blob/main/inst/sql/sql_server/DomainConceptGroup.sql
@jamieweaver and I were reviewing a Shiny app that showed the same covariate name with different values and after hunting for the problem we found the issue above.
Version of FeaturExtraction being run: FeatureExtraction v3.3.1