darwin-eu / CDMConnector

A pipe friendly way to interact with an OMOP Common Data Model
https://darwin-eu.github.io/CDMConnector/
Apache License 2.0
12 stars 10 forks source link

SQLite databases do not work #26

Closed raivokolde closed 1 month ago

raivokolde commented 2 months ago

I wanted to use CDMConnector on a sqlite database and failed to connect. I do not know if this is a bug or a feature, bu I expected CDM connector to be able to handle this as all the underlying technologies support SQLite very well.

Here is a reproducible example of code that gave me an error.

library(Eunomia)
library(CDMConnector)

sqlite_file = Eunomia::getDatabaseFile("GiBleed")

con = dbConnect(RSQLite::SQLite(), sqlite_file)
cdm = cdm_from_con(con, cdm_schema = "main", write_schema = "main")

The error looked like that: Screenshot 2024-09-09 at 12 16 51

catalamarti commented 2 months ago

Hi @raivokolde seems that sqlite is not supported currently.

You can create a eunomia reference like:

Sys.setenv("EUNOMIA_DATA_FOLDER" = here::here())
CDMConnector::downloadEunomiaData("GiBleed")
con <- duckdb::dbConnect(duckdb::duckdb(), CDMConnector::eunomiaDir("GiBleed"))
cdm <- CDMConnector::cdmFromCon(con, cdm_schema = "main", write_schema = "main")

@ablack3 on Friday I was working to create a branch that is compatible with sqlite: https://github.com/darwin-eu-dev/CDMConnector/tree/sqlite_support But at the end I found the internal code too messy to continue, I am happy to try to persist.

ablack3 commented 2 months ago

The main problem with sqlite cdms is date manipulation.

SqlRender handles this for OHDSI-SQL but I'm not sure how to handle it with dplyr code that needs to be cross platform.

You can see what SqlRender does to handle date functions in sqlite here: https://github.com/OHDSI/SqlRender/blob/020d26ca6a83de96675ba0256b0d2e443b6abfd9/inst/csv/replacementPatterns.csv#L832

duckdb eliminates this problem entirely and I'm not seeing a reason to use sqlite instead of duckdb. What would be the reason to use sqlite with all the date work-arounds instead of duckdb which has a date type and date manipulation SQL functions?

library(Eunomia)
library(dplyr)

sqlite_file = Eunomia::getDatabaseFile("GiBleed")
#> attempting to download GiBleed
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite

con = DBI::dbConnect(RSQLite::SQLite(), sqlite_file)

observation_period <- dplyr::tbl(con, "observation_period")

observation_period %>% 
  mutate(observation_length = observation_period_end_date - observation_period_end_date)
#> # Source:   SQL [?? x 6]
#> # Database: sqlite 3.46.0 [/private/var/folders/2j/8z0yfn1j69q8sxjc7vj9yhz40000gp/T/RtmpQAUmDg/file13bd319451472.sqlite]
#>    observation_period_id person_id observation_period_s…¹ observation_period_e…²
#>                    <int>     <int>                  <int>                  <int>
#>  1                     6         6             -189475200             1170720000
#>  2                    13        13             1240704000             1555200000
#>  3                    27        27             1012348800             1542758400
#>  4                    16        16               56246400             1509580800
#>  5                    55        55             1243641600             1553299200
#>  6                    60        60              659145600             1548201600
#>  7                    42        42            -1898553600             1552435200
#>  8                    33        33              516240000             1536537600
#>  9                    18        18             -130118400             1541548800
#> 10                    25        25             1174176000             1554595200
#> # ℹ more rows
#> # ℹ abbreviated names: ¹​observation_period_start_date,
#> #   ²​observation_period_end_date
#> # ℹ 2 more variables: period_type_concept_id <int>, observation_length <int>

DBI::dbDisconnect(con)

Created on 2024-09-09 with reprex v2.1.1

ablack3 commented 2 months ago

You can see here that DatabaseConnector tries to it seem as though the Sqlite CDM does contain dates but the workarounds have limits.

library(Eunomia)

sqlite_file = Eunomia::getDatabaseFile("GiBleed")

con <- DatabaseConnector::connect(dbms = "sqlite", server = sqlite_file)
#> Connecting using SQLite driver

# database connector tries to hide the fact that there are not dates
DatabaseConnector::querySql(con, "select * from observation_period") |> dplyr::tibble()
#> # A tibble: 5,343 × 5
#>    OBSERVATION_PERIOD_ID PERSON_ID OBSERVATION_PERIOD_S…¹ OBSERVATION_PERIOD_E…²
#>                    <dbl>     <dbl> <date>                 <date>                
#>  1                     6         6 1963-12-31             2007-02-06            
#>  2                    13        13 2009-04-26             2019-04-14            
#>  3                    27        27 2002-01-30             2018-11-21            
#>  4                    16        16 1971-10-14             2017-11-02            
#>  5                    55        55 2009-05-30             2019-03-23            
#>  6                    60        60 1990-11-21             2019-01-23            
#>  7                    42        42 1909-11-03             2019-03-13            
#>  8                    33        33 1986-05-12             2018-09-10            
#>  9                    18        18 1965-11-17             2018-11-07            
#> 10                    25        25 2007-03-18             2019-04-07            
#> # ℹ 5,333 more rows
#> # ℹ abbreviated names: ¹​OBSERVATION_PERIOD_START_DATE,
#> #   ²​OBSERVATION_PERIOD_END_DATE
#> # ℹ 1 more variable: PERIOD_TYPE_CONCEPT_ID <dbl>

# but the workarounds only go so far and eventually you run into a weird edge case that does not work
DatabaseConnector::querySql(con, "select observation_period_start_date as a from observation_period") |> dplyr::tibble()
#> # A tibble: 5,343 × 1
#>              A
#>          <dbl>
#>  1  -189475200
#>  2  1240704000
#>  3  1012348800
#>  4    56246400
#>  5  1243641600
#>  6   659145600
#>  7 -1898553600
#>  8   516240000
#>  9  -130118400
#> 10  1174176000
#> # ℹ 5,333 more rows

# in this case your column name needs to have the subsstring "date" in it
DatabaseConnector::querySql(con, "select observation_period_start_date as a_date from observation_period") |> dplyr::tibble()
#> # A tibble: 5,343 × 1
#>    A_DATE    
#>    <date>    
#>  1 1963-12-31
#>  2 2009-04-26
#>  3 2002-01-30
#>  4 1971-10-14
#>  5 2009-05-30
#>  6 1990-11-21
#>  7 1909-11-03
#>  8 1986-05-12
#>  9 1965-11-17
#> 10 2007-03-18
#> # ℹ 5,333 more rows

# But this has a consequence that could lead to unexpected type conversion
DatabaseConnector::querySql(con, "select person_id as a_date from observation_period") |> dplyr::tibble()
#> # A tibble: 5,343 × 1
#>    A_DATE    
#>    <date>    
#>  1 1970-01-01
#>  2 1970-01-01
#>  3 1970-01-01
#>  4 1970-01-01
#>  5 1970-01-01
#>  6 1970-01-01
#>  7 1970-01-01
#>  8 1970-01-01
#>  9 1970-01-01
#> 10 1970-01-01
#> # ℹ 5,333 more rows

DatabaseConnector::disconnect(con)

Created on 2024-09-09 with reprex v2.1.1

raivokolde commented 2 months ago

Thanks for explaining the reasons! My use-case was that I wanted to generate a toy dataset to play around with and sqlite seemed like an obvious option to use. Especially since OHDSI examples usually use it as well. I guess its easy to use duckdb as well, but as I understand I have to keep two versions of the dataset if I want to use both OHDSI and DARWIN tools on it, because on quick look DatabaseConnector does not support DuckDB.

However, if sqlite is officially not supported, then It would be nice it this is spelled out explicitly somewhere (with an example code, how to get the data into duckdb wasily). It really seems like an obvious thing to try. I spent quite a bit of time looking through the source, trying to figure out how to get the sqlite working.

ablack3 commented 2 months ago

No problem and thanks for your suggestion. In the next releases of CDMConnector and DatabaseConnector you should see better interoperability. But currently you can use DatabaseConnector and OHDSI tools with duckdb CDMs. Here's some example code that runs Achilles (you do need to install the development version of Achilles though I think).

# CDMConnector has several example datasets includeing the GIBleed dataset from Eunomia

library(CDMConnector)

# here are the example cdm datasets that are available.
example_datasets()
#>  [1] "GiBleed"                             "synthea-allergies-10k"              
#>  [3] "synthea-anemia-10k"                  "synthea-breast_cancer-10k"          
#>  [5] "synthea-contraceptives-10k"          "synthea-covid19-10k"                
#>  [7] "synthea-covid19-200k"                "synthea-dermatitis-10k"             
#>  [9] "synthea-heart-10k"                   "synthea-hiv-10k"                    
#> [11] "synthea-lung_cancer-10k"             "synthea-medications-10k"            
#> [13] "synthea-metabolic_syndrome-10k"      "synthea-opioid_addiction-10k"       
#> [15] "synthea-rheumatoid_arthritis-10k"    "synthea-snf-10k"                    
#> [17] "synthea-surgery-10k"                 "synthea-total_joint_replacement-10k"
#> [19] "synthea-veteran_prostate_cancer-10k" "synthea-veterans-10k"               
#> [21] "synthea-weight_loss-10k"             "empty_cdm"

# create a copy in a new tempfile. You can also specify a location.
duckdb_file <- eunomia_dir("GiBleed")

con <- DBI::dbConnect(duckdb::duckdb(), duckdb_file)
cdm <- cdm_from_con(con, "main", "main")
#> Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable',
#>  target signature 'duckdb_connection#Id'.
#>  "duckdb_connection#ANY" would also be valid

cdm$observation_period
#> # Source:   table<main.observation_period> [?? x 5]
#> # Database: DuckDB v1.0.0 [root@Darwin 23.1.0:R 4.3.3//private/var/folders/2j/8z0yfn1j69q8sxjc7vj9yhz40000gp/T/RtmpcrkxAJ/file13df548e6f84d.duckdb]
#>    observation_period_id person_id observation_period_s…¹ observation_period_e…²
#>                    <int>     <int> <date>                 <date>                
#>  1                     6         6 1963-12-31             2007-02-06            
#>  2                    13        13 2009-04-26             2019-04-14            
#>  3                    27        27 2002-01-30             2018-11-21            
#>  4                    16        16 1971-10-14             2017-11-02            
#>  5                    55        55 2009-05-30             2019-03-23            
#>  6                    60        60 1990-11-21             2019-01-23            
#>  7                    42        42 1909-11-03             2019-03-13            
#>  8                    33        33 1986-05-12             2018-09-10            
#>  9                    18        18 1965-11-17             2018-11-07            
#> 10                    25        25 2007-03-18             2019-04-07            
#> # ℹ more rows
#> # ℹ abbreviated names: ¹​observation_period_start_date,
#> #   ²​observation_period_end_date
#> # ℹ 1 more variable: period_type_concept_id <int>

DBI::dbDisconnect(con)

# You can also use these example cdms with DatabaseConnector 

connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "duckdb",
  server = duckdb_file
)

con <- DatabaseConnector::connect(connectionDetails)
#> Connecting using DuckDB driver
DatabaseConnector::querySql(con, "select * from observation_period") |> dplyr::tibble()
#> # A tibble: 5,343 × 5
#>    OBSERVATION_PERIOD_ID PERSON_ID OBSERVATION_PERIOD_S…¹ OBSERVATION_PERIOD_E…²
#>                    <dbl>     <dbl> <date>                 <date>                
#>  1                     6         6 1963-12-31             2007-02-06            
#>  2                    13        13 2009-04-26             2019-04-14            
#>  3                    27        27 2002-01-30             2018-11-21            
#>  4                    16        16 1971-10-14             2017-11-02            
#>  5                    55        55 2009-05-30             2019-03-23            
#>  6                    60        60 1990-11-21             2019-01-23            
#>  7                    42        42 1909-11-03             2019-03-13            
#>  8                    33        33 1986-05-12             2018-09-10            
#>  9                    18        18 1965-11-17             2018-11-07            
#> 10                    25        25 2007-03-18             2019-04-07            
#> # ℹ 5,333 more rows
#> # ℹ abbreviated names: ¹​OBSERVATION_PERIOD_START_DATE,
#> #   ²​OBSERVATION_PERIOD_END_DATE
#> # ℹ 1 more variable: PERIOD_TYPE_CONCEPT_ID <dbl>
DatabaseConnector::disconnect(con)

# let's try running Achilles

results <- Achilles::achilles(
  connectionDetails = connectionDetails,
  cdmDatabaseSchema = "main",
  sourceName = "Eunomia_gibleed")
#> Currently in a tryCatch or withCallingHandlers block, so unable to add global calling handlers. ParallelLogger will not capture R messages, errors, and warnings, only explicit calls to ParallelLogger. (This message will not be shown again this R session)
#> Performing database characterization.
#> CDM Version 5.3.1 found in cdm_source table.
#> Beginning single-threaded execution
#> Running 194 analyses.
#> Analysis 0 (Source name) -- START
#> [Main Analysis] [COMPLETE] 0 (0.006437 secs)
#> Analysis 1 (Number of persons) -- START
#> [Main Analysis] [COMPLETE] 1 (0.001978 secs)
#> Analysis 2 (Number of persons by gender) -- START
#> [Main Analysis] [COMPLETE] 2 (0.002398 secs)
#> Analysis 3 (Number of persons by year of birth) -- START
#> [Main Analysis] [COMPLETE] 3 (0.002558 secs)
#> Analysis 4 (Number of persons by race) -- START
#> [Main Analysis] [COMPLETE] 4 (0.002291 secs)
#> Analysis 5 (Number of persons by ethnicity) -- START
#> [Main Analysis] [COMPLETE] 5 (0.002167 secs)
#> Analysis 10 (Number of all persons by year of birth by gender) -- START
#> [Main Analysis] [COMPLETE] 10 (0.002586 secs)
#> Analysis 11 (Number of non-deceased persons by year of birth by gender) -- START
#> [Main Analysis] [COMPLETE] 11 (0.002746 secs)
#> Analysis 12 (Number of persons by race and ethnicity) -- START
#> [Main Analysis] [COMPLETE] 12 (0.002336 secs)
#> Analysis 101 (Number of persons by age, with age at first observation period) -- START
#> [Main Analysis] [COMPLETE] 101 (0.002600 secs)
#> Analysis 102 (Number of persons by gender by age, with age at first observation period) -- START
#> [Main Analysis] [COMPLETE] 102 (0.002590 secs)
#> Analysis 103 (Distribution of age at first observation period) -- START
#> [Main Analysis] [COMPLETE] 103 (0.005243 secs)
#> Analysis 104 (Distribution of age at first observation period by gender) -- START
#> [Main Analysis] [COMPLETE] 104 (0.007521 secs)
#> Analysis 105 (Length of observation (days) of first observation period) -- START
#> [Main Analysis] [COMPLETE] 105 (0.064705 secs)
#> Analysis 106 (Length of observation (days) of first observation period by gender) -- START
#> [Main Analysis] [COMPLETE] 106 (0.018555 secs)
#> Analysis 107 (Length of observation (days) of first observation period by age decile) -- START
#> [Main Analysis] [COMPLETE] 107 (0.020521 secs)
#> Analysis 108 (Number of persons by length of observation period, in 30d increments) -- START
#> [Main Analysis] [COMPLETE] 108 (0.004477 secs)
#> Analysis 109 (Number of persons with continuous observation in each year) -- START
#> [Main Analysis] [COMPLETE] 109 (0.015182 secs)
#> Analysis 110 (Number of persons with continuous observation in each month) -- START
#> [Main Analysis] [COMPLETE] 110 (0.053935 secs)
#> Analysis 111 (Number of persons by observation period start month) -- START
#> [Main Analysis] [COMPLETE] 111 (0.003160 secs)
#> Analysis 112 (Number of persons by observation period end month) -- START
#> [Main Analysis] [COMPLETE] 112 (0.002821 secs)
#> Analysis 113 (Number of persons by number of observation periods) -- START
#> [Main Analysis] [COMPLETE] 113 (0.002840 secs)
#> Analysis 116 (Number of persons with at least one day of observation in each year by gender and age decile) -- START
#> [Main Analysis] [COMPLETE] 116 (0.017631 secs)
#> Analysis 117 (Number of persons with at least one day of observation in each month) -- START
#> [Main Analysis] [COMPLETE] 117 (0.101805 secs)
#> Analysis 119 (Number of observation period records by period_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 119 (0.002037 secs)
#> Analysis 200 (Number of persons with at least one visit occurrence, by visit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 200 (0.003228 secs)
#> Analysis 201 (Number of visit occurrence records, by visit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 201 (0.002397 secs)
#> Analysis 202 (Number of persons by visit occurrence start month, by visit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 202 (0.003086 secs)
#> Analysis 203 (Number of distinct visit occurrence concepts per person) -- START
#> [Main Analysis] [COMPLETE] 203 (0.007279 secs)
#> Analysis 204 (Number of persons with at least one visit occurrence, by visit_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 204 (0.005820 secs)
#> Analysis 206 (Distribution of age by visit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 206 (0.010442 secs)
#> Analysis 207 (Number of visit records with invalid person_id) -- START
#> [Main Analysis] [COMPLETE] 207 (0.002197 secs)
#> Analysis 209 (Number of visit records with invalid care_site_id) -- START
#> [Main Analysis] [COMPLETE] 209 (0.001867 secs)
#> Analysis 210 (Number of visit_occurrence records outside a valid observation period) -- START
#> [Main Analysis] [COMPLETE] 210 (0.002075 secs)
#> Analysis 212 (Number of persons with at least one visit occurrence, by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 212 (0.003168 secs)
#> Analysis 213 (Distribution of length of stay by visit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 213 (0.006806 secs)
#> Analysis 220 (Number of visit occurrence records by visit occurrence start month) -- START
#> [Main Analysis] [COMPLETE] 220 (0.002779 secs)
#> Analysis 221 (Number of persons by visit start year) -- START
#> [Main Analysis] [COMPLETE] 221 (0.002736 secs)
#> Analysis 225 (Number of visit_occurrence records by visit_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 225 (0.002239 secs)
#> Analysis 226 (Number of records by domain by visit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 226 (0.005207 secs)
#> Analysis 300 (Number of providers) -- START
#> [Main Analysis] [COMPLETE] 300 (0.001765 secs)
#> Analysis 301 (Number of providers by specialty concept_id) -- START
#> [Main Analysis] [COMPLETE] 301 (0.001564 secs)
#> Analysis 303 (Number of providers records by specialty_concept_id and visit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 303 (0.001602 secs)
#> Analysis 325 (Number of provider records by specialty_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 325 (0.001471 secs)
#> Analysis 400 (Number of persons with at least one condition occurrence, by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 400 (0.005009 secs)
#> Analysis 401 (Number of condition occurrence records, by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 401 (0.003790 secs)
#> Analysis 402 (Number of persons by condition occurrence start month, by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 402 (0.009424 secs)
#> Analysis 403 (Number of distinct condition occurrence concepts per person) -- START
#> [Main Analysis] [COMPLETE] 403 (0.010985 secs)
#> Analysis 404 (Number of persons with at least one condition occurrence, by condition_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 404 (0.010676 secs)
#> Analysis 405 (Number of condition occurrence records, by condition_concept_id by condition_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 405 (0.004389 secs)
#> Analysis 406 (Distribution of age by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 406 (0.017663 secs)
#> Analysis 414 (Number of condition occurrence records, by condition_status_concept_id) -- START
#> [Main Analysis] [COMPLETE] 414 (0.003416 secs)
#> Analysis 415 (Number of condition occurrence records, by condition_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 415 (0.003277 secs)
#> Analysis 416 (Number of condition occurrence records, by condition_status_concept_id, condition_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 416 (0.003255 secs)
#> Analysis 420 (Number of condition occurrence records by condition occurrence start month) -- START
#> [Main Analysis] [COMPLETE] 420 (0.004954 secs)
#> Analysis 425 (Number of condition_occurrence records by condition_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 425 (0.003760 secs)
#> Analysis 500 (Number of persons with death, by cause_concept_id) -- START
#> [Main Analysis] [COMPLETE] 500 (0.001643 secs)
#> Analysis 501 (Number of records of death, by cause_concept_id) -- START
#> [Main Analysis] [COMPLETE] 501 (0.001571 secs)
#> Analysis 502 (Number of persons by death month) -- START
#> [Main Analysis] [COMPLETE] 502 (0.001740 secs)
#> Analysis 504 (Number of persons with a death, by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 504 (0.001871 secs)
#> Analysis 505 (Number of death records, by death_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 505 (0.001525 secs)
#> Analysis 506 (Distribution of age at death by gender) -- START
#> [Main Analysis] [COMPLETE] 506 (0.007068 secs)
#> Analysis 511 (Distribution of time from death to last condition) -- START
#> [Main Analysis] [COMPLETE] 511 (0.002726 secs)
#> Analysis 512 (Distribution of time from death to last drug) -- START
#> [Main Analysis] [COMPLETE] 512 (0.004224 secs)
#> Analysis 513 (Distribution of time from death to last visit) -- START
#> [Main Analysis] [COMPLETE] 513 (0.004303 secs)
#> Analysis 514 (Distribution of time from death to last procedure) -- START
#> [Main Analysis] [COMPLETE] 514 (0.004349 secs)
#> Analysis 515 (Distribution of time from death to last observation) -- START
#> [Main Analysis] [COMPLETE] 515 (0.004112 secs)
#> Analysis 525 (Number of death records by cause_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 525 (0.001492 secs)
#> Analysis 600 (Number of persons with at least one procedure occurrence, by procedure_concept_id) -- START
#> [Main Analysis] [COMPLETE] 600 (0.003836 secs)
#> Analysis 601 (Number of procedure occurrence records, by procedure_concept_id) -- START
#> [Main Analysis] [COMPLETE] 601 (0.003277 secs)
#> Analysis 602 (Number of persons by procedure occurrence start month, by procedure_concept_id) -- START
#> [Main Analysis] [COMPLETE] 602 (0.005920 secs)
#> Analysis 603 (Number of distinct procedure occurrence concepts per person) -- START
#> [Main Analysis] [COMPLETE] 603 (0.008993 secs)
#> Analysis 604 (Number of persons with at least one procedure occurrence, by procedure_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 604 (0.006924 secs)
#> Analysis 605 (Number of procedure occurrence records, by procedure_concept_id by procedure_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 605 (0.003371 secs)
#> Analysis 606 (Distribution of age by procedure_concept_id) -- START
#> [Main Analysis] [COMPLETE] 606 (0.015801 secs)
#> Analysis 620 (Number of procedure occurrence records  by procedure occurrence start month) -- START
#> [Main Analysis] [COMPLETE] 620 (0.004078 secs)
#> Analysis 625 (Number of procedure_occurrence records by procedure_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 625 (0.003206 secs)
#> Analysis 630 (Number of procedure_occurrence records inside a valid observation period) -- START
#> [Main Analysis] [COMPLETE] 630 (0.002538 secs)
#> Analysis 691 (Percentage of total persons that have at least x procedures) -- START
#> [Main Analysis] [COMPLETE] 691 (0.005247 secs)
#> Analysis 700 (Number of persons with at least one drug exposure, by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 700 (0.005061 secs)
#> Analysis 701 (Number of drug exposure records, by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 701 (0.003855 secs)
#> Analysis 702 (Number of persons by drug exposure start month, by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 702 (0.008609 secs)
#> Analysis 703 (Number of distinct drug exposure concepts per person) -- START
#> [Main Analysis] [COMPLETE] 703 (0.010909 secs)
#> Analysis 704 (Number of persons with at least one drug exposure, by drug_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 704 (0.010318 secs)
#> Analysis 705 (Number of drug exposure records, by drug_concept_id by drug_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 705 (0.004379 secs)
#> Analysis 706 (Distribution of age by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 706 (0.017753 secs)
#> Analysis 715 (Distribution of days_supply by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 715 (0.013023 secs)
#> Analysis 716 (Distribution of refills by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 716 (0.011751 secs)
#> Analysis 717 (Distribution of quantity by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 717 (0.011385 secs)
#> Analysis 720 (Number of drug exposure records  by drug exposure start month) -- START
#> [Main Analysis] [COMPLETE] 720 (0.005649 secs)
#> Analysis 725 (Number of drug_exposure records by drug_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 725 (0.004102 secs)
#> Analysis 791 (Percentage of total persons that have at least x drug exposures) -- START
#> [Main Analysis] [COMPLETE] 791 (0.006816 secs)
#> Analysis 800 (Number of persons with at least one observation occurrence, by observation_concept_id) -- START
#> [Main Analysis] [COMPLETE] 800 (0.002930 secs)
#> Analysis 801 (Number of observation occurrence records, by observation_concept_id) -- START
#> [Main Analysis] [COMPLETE] 801 (0.002623 secs)
#> Analysis 802 (Number of persons by observation occurrence start month, by observation_concept_id) -- START
#> [Main Analysis] [COMPLETE] 802 (0.003408 secs)
#> Analysis 803 (Number of distinct observation occurrence concepts per person) -- START
#> [Main Analysis] [COMPLETE] 803 (0.008034 secs)
#> Analysis 804 (Number of persons with at least one observation occurrence, by observation_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 804 (0.003838 secs)
#> Analysis 805 (Number of observation occurrence records, by observation_concept_id by observation_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 805 (0.002596 secs)
#> Analysis 806 (Distribution of age by observation_concept_id) -- START
#> [Main Analysis] [COMPLETE] 806 (0.012166 secs)
#> Analysis 807 (Number of observation occurrence records, by observation_concept_id and unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 807 (0.002725 secs)
#> Analysis 814 (Number of observation records with no value (numeric, string, or concept)) -- START
#> [Main Analysis] [COMPLETE] 814 (0.001679 secs)
#> Analysis 815 (Distribution of numeric values, by observation_concept_id and unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 815 (0.006070 secs)
#> Analysis 820 (Number of observation records  by observation start month) -- START
#> [Main Analysis] [COMPLETE] 820 (0.002598 secs)
#> Analysis 822 (Number of observation records, by observation_concept_id and value_as_concept_id) -- START
#> [Main Analysis] [COMPLETE] 822 (0.002386 secs)
#> Analysis 823 (Number of observation records, by observation_concept_id and qualifier_concept_id) -- START
#> [Main Analysis] [COMPLETE] 823 (0.002333 secs)
#> Analysis 825 (Number of observation records by observation_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 825 (0.002378 secs)
#> Analysis 826 (Number of observation records by value_as_concept_id) -- START
#> [Main Analysis] [COMPLETE] 826 (0.002224 secs)
#> Analysis 827 (Number of observation records by unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 827 (0.002105 secs)
#> Analysis 891 (Percentage of total persons that have at least x observations) -- START
#> [Main Analysis] [COMPLETE] 891 (0.003672 secs)
#> Analysis 900 (Number of persons with at least one drug era, by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 900 (0.004394 secs)
#> Analysis 901 (Number of drug era records, by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 901 (0.003376 secs)
#> Analysis 902 (Number of persons by drug era start month, by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 902 (0.007540 secs)
#> Analysis 903 (Number of distinct drug era concepts per person) -- START
#> [Main Analysis] [COMPLETE] 903 (0.010048 secs)
#> Analysis 904 (Number of persons with at least one drug era, by drug_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 904 (0.008929 secs)
#> Analysis 906 (Distribution of age by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 906 (0.017857 secs)
#> Analysis 907 (Distribution of drug era length, by drug_concept_id) -- START
#> [Main Analysis] [COMPLETE] 907 (0.010943 secs)
#> Analysis 920 (Number of drug era records by drug era start month) -- START
#> [Main Analysis] [COMPLETE] 920 (0.004574 secs)
#> Analysis 1000 (Number of persons with at least one condition era, by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1000 (0.004949 secs)
#> Analysis 1001 (Number of condition era records, by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1001 (0.006122 secs)
#> Analysis 1002 (Number of persons by condition era start month, by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1002 (0.009203 secs)
#> Analysis 1003 (Number of distinct condition era concepts per person) -- START
#> [Main Analysis] [COMPLETE] 1003 (0.010836 secs)
#> Analysis 1004 (Number of persons with at least one condition era, by condition_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 1004 (0.010688 secs)
#> Analysis 1006 (Distribution of age by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1006 (0.019482 secs)
#> Analysis 1007 (Distribution of condition era length, by condition_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1007 (0.014573 secs)
#> Analysis 1020 (Number of condition era records by condition era start month) -- START
#> [Main Analysis] [COMPLETE] 1020 (0.004987 secs)
#> Analysis 1100 (Number of persons by location 3-digit zip) -- START
#> [Main Analysis] [COMPLETE] 1100 (0.001733 secs)
#> Analysis 1101 (Number of persons by location state) -- START
#> [Main Analysis] [COMPLETE] 1101 (0.001642 secs)
#> Analysis 1102 (Number of care sites by location 3-digit zip) -- START
#> [Main Analysis] [COMPLETE] 1102 (0.001620 secs)
#> Analysis 1103 (Number of care sites by location state) -- START
#> [Main Analysis] [COMPLETE] 1103 (0.001664 secs)
#> Analysis 1200 (Number of persons by place of service) -- START
#> [Main Analysis] [COMPLETE] 1200 (0.001629 secs)
#> Analysis 1201 (Number of visits by place of service) -- START
#> [Main Analysis] [COMPLETE] 1201 (0.001534 secs)
#> Analysis 1202 (Number of care sites by place of service) -- START
#> [Main Analysis] [COMPLETE] 1202 (0.001753 secs)
#> Analysis 1203 (Number of visits by place of service discharge type) -- START
#> [Main Analysis] [COMPLETE] 1203 (0.001937 secs)
#> Analysis 1300 (Number of persons with at least one visit detail, by visit_detail_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1300 (0.001554 secs)
#> Analysis 1301 (Number of visit detail records, by visit_detail_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1301 (0.001548 secs)
#> Analysis 1302 (Number of persons by visit detail start month, by visit_detail_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1302 (0.001634 secs)
#> Analysis 1303 (Number of distinct visit detail concepts per person) -- START
#> [Main Analysis] [COMPLETE] 1303 (0.004037 secs)
#> Analysis 1304 (Number of persons with at least one visit detail, by visit_detail_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 1304 (0.001889 secs)
#> Analysis 1306 (Distribution of age by visit_detail_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1306 (0.004400 secs)
#> Analysis 1312 (Number of persons with at least one visit detail, by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 1312 (0.001845 secs)
#> Analysis 1313 (Distribution of length of stay by visit_detail_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1313 (0.003970 secs)
#> Analysis 1320 (Number of visit detail records by visit detail start month) -- START
#> [Main Analysis] [COMPLETE] 1320 (0.001670 secs)
#> Analysis 1321 (Number of persons by visit start year) -- START
#> [Main Analysis] [COMPLETE] 1321 (0.001581 secs)
#> Analysis 1325 (Number of visit_detail records by visit_detail_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1325 (0.001528 secs)
#> Analysis 1326 (Number of records by domain by visit_detail_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1326 (0.003770 secs)
#> Analysis 1406 (Length of payer plan (days) of first payer plan period by gender) -- START
#> [Main Analysis] [COMPLETE] 1406 (0.004020 secs)
#> Analysis 1407 (Length of payer plan (days) of first payer plan period by age decile) -- START
#> [Main Analysis] [COMPLETE] 1407 (0.004044 secs)
#> Analysis 1408 (Number of persons by length of payer plan period, in 30d increments) -- START
#> [Main Analysis] [COMPLETE] 1408 (0.003794 secs)
#> Analysis 1409 (Number of persons with continuous payer plan in each year) -- START
#> [Main Analysis] [COMPLETE] 1409 (0.002566 secs)
#> Analysis 1410 (Number of persons with continuous payer plan in each month) -- START
#> [Main Analysis] [COMPLETE] 1410 (0.002728 secs)
#> Analysis 1411 (Number of persons by payer plan period start month) -- START
#> [Main Analysis] [COMPLETE] 1411 (0.001572 secs)
#> Analysis 1412 (Number of persons by payer plan period end month) -- START
#> [Main Analysis] [COMPLETE] 1412 (0.001546 secs)
#> Analysis 1413 (Number of persons by number of payer plan periods) -- START
#> [Main Analysis] [COMPLETE] 1413 (0.001478 secs)
#> Analysis 1425 (Number of payer_plan_period records by payer_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1425 (0.001371 secs)
#> Analysis 1800 (Number of persons with at least one measurement occurrence, by measurement_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1800 (0.004225 secs)
#> Analysis 1801 (Number of measurement occurrence records, by measurement_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1801 (0.003260 secs)
#> Analysis 1802 (Number of persons by measurement occurrence start month, by measurement_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1802 (0.006527 secs)
#> Analysis 1803 (Number of distinct mesurement occurrence concepts per person) -- START
#> [Main Analysis] [COMPLETE] 1803 (0.009629 secs)
#> Analysis 1804 (Number of persons with at least one mesurement  occurrence, by measurement_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 1804 (0.007569 secs)
#> Analysis 1805 (Number of measurement occurrence records, by measurement_concept_id by measurement_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1805 (0.003584 secs)
#> Analysis 1806 (Distribution of age by measurement_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1806 (0.016100 secs)
#> Analysis 1807 (Number of measurement occurrence records, by measurement_concept_id and unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1807 (0.003666 secs)
#> Analysis 1811 (Number of measurement records with a valid value (with a mapped, non-null value_as_number)) -- START
#> [Main Analysis] [COMPLETE] 1811 (0.002173 secs)
#> Analysis 1814 (Number of measurement records with no value (numeric, string, or concept)) -- START
#> [Main Analysis] [COMPLETE] 1814 (0.002979 secs)
#> Analysis 1815 (Distribution of numeric values, by measurement_concept_id and unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1815 (0.005932 secs)
#> Analysis 1816 (Distribution of low range, by measurement_concept_id and unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1816 (0.006399 secs)
#> Analysis 1817 (Distribution of high range, by measurement_concept_id and unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1817 (0.006610 secs)
#> Analysis 1818 (Number of measurement records below/within/above normal range, by measurement_concept_id and unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1818 (0.003010 secs)
#> Analysis 1819 (Number of measurement records, by measurement_concept_id, with a valid value (with a mapped, non-null value_as_number)) -- START
#> [Main Analysis] [COMPLETE] 1819 (0.002026 secs)
#> Analysis 1820 (Number of measurement records  by measurement start month) -- START
#> [Main Analysis] [COMPLETE] 1820 (0.004084 secs)
#> Analysis 1821 (Number of measurement records with no numeric value) -- START
#> [Main Analysis] [COMPLETE] 1821 (0.001690 secs)
#> Analysis 1822 (Number of measurement records, by measurement_concept_id and value_as_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1822 (0.003450 secs)
#> Analysis 1823 (Number of measurement records, by measurement_concept_id and operator_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1823 (0.003417 secs)
#> Analysis 1825 (Number of measurement records by measurement_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1825 (0.003280 secs)
#> Analysis 1826 (Number of measurement records by value_as_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1826 (0.005236 secs)
#> Analysis 1827 (Number of measurement records by unit_concept_id) -- START
#> [Main Analysis] [COMPLETE] 1827 (0.003001 secs)
#> Analysis 1891 (Percentage of total persons that have at least x measurements) -- START
#> [Main Analysis] [COMPLETE] 1891 (0.006376 secs)
#> Analysis 1900 (Source values mapped to concept_id 0 by table, by column, by source_value) -- START
#> [Main Analysis] [COMPLETE] 1900 (0.009957 secs)
#> Analysis 2000 (Number of patients with at least 1 Dx and 1 Rx) -- START
#> [Main Analysis] [COMPLETE] 2000 (0.005993 secs)
#> Analysis 2001 (Number of patients with at least 1 Dx and 1 Proc) -- START
#> [Main Analysis] [COMPLETE] 2001 (0.004754 secs)
#> Analysis 2002 (Number of patients with at least 1 Meas, 1 Dx and 1 Rx) -- START
#> [Main Analysis] [COMPLETE] 2002 (0.005955 secs)
#> Analysis 2003 (Number of patients with at least 1 Visit) -- START
#> [Main Analysis] [COMPLETE] 2003 (0.002282 secs)
#> Analysis 2004 (Number of distinct patients that overlap between specific domains) -- START
#> [Main Analysis] [COMPLETE] 2004 (0.084734 secs)
#> Analysis 2100 (Number of persons with at least one device exposure, by device_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2100 (0.002200 secs)
#> Analysis 2101 (Number of device exposure records, by device_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2101 (0.001813 secs)
#> Analysis 2102 (Number of persons by device records  start month, by device_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2102 (0.001853 secs)
#> Analysis 2104 (Number of persons with at least one device exposure, by device_concept_id by calendar year by gender by age decile) -- START
#> [Main Analysis] [COMPLETE] 2104 (0.002038 secs)
#> Analysis 2105 (Number of device exposure records, by device_concept_id by device_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2105 (0.001693 secs)
#> Analysis 2106 (Distribution of age by device_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2106 (0.004980 secs)
#> Analysis 2120 (Number of device_exposure records by device_exposure start month) -- START
#> [Main Analysis] [COMPLETE] 2120 (0.001736 secs)
#> Analysis 2125 (Number of device_exposure records by device_source_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2125 (0.001657 secs)
#> Analysis 2191 (Percentage of total persons that have at least x device exposures) -- START
#> [Main Analysis] [COMPLETE] 2191 (0.001805 secs)
#> Analysis 2200 (Number of persons with at least one note by  note_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2200 (0.001482 secs)
#> Analysis 2201 (Number of note records, by note_type_concept_id) -- START
#> [Main Analysis] [COMPLETE] 2201 (0.001536 secs)
#> Merging scratch Achilles tables
#> Done. Achilles results can now be found in schema main
#> Dropping scratch Achilles tables from schema #
#> Temporary Achilles tables removed from schema #
#> [Total Runtime] 4.094722 secs

# now you can reconnect with CDMConnector and see the achilles tables

con <- DBI::dbConnect(duckdb::duckdb(), duckdb_file)
cdm <- cdm_from_con(con, "main", "main", achilles_schema = "main")
#> Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable',
#>  target signature 'duckdb_connection#Id'.
#>  "duckdb_connection#ANY" would also be valid

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
#> concept_class, concept_relationship, relationship, concept_synonym,
#> concept_ancestor, source_to_concept_map, drug_strength
#> • cohort tables: -
#> • achilles tables: achilles_analysis, achilles_results, achilles_results_dist
#> • other tables: -

cdm$achilles_results
#> # Source:   table<main.achilles_results> [?? x 7]
#> # Database: DuckDB v1.0.0 [root@Darwin 23.1.0:R 4.3.3//private/var/folders/2j/8z0yfn1j69q8sxjc7vj9yhz40000gp/T/RtmpcrkxAJ/file13df548e6f84d.duckdb]
#>    analysis_id stratum_1     stratum_2 stratum_3 stratum_4 stratum_5 count_value
#>          <int> <chr>         <chr>     <chr>     <chr>     <chr>           <dbl>
#>  1           0 Eunomia_gibl… 1.7.2     2024-09-… <NA>      <NA>             2694
#>  2     2000000 0.01          <NA>      <NA>      <NA>      <NA>                6
#>  3           1 <NA>          <NA>      <NA>      <NA>      <NA>             2694
#>  4     2000001 0             <NA>      <NA>      <NA>      <NA>                6
#>  5           2 8532          <NA>      <NA>      <NA>      <NA>             1373
#>  6           2 8507          <NA>      <NA>      <NA>      <NA>             1321
#>  7     2000002 0             <NA>      <NA>      <NA>      <NA>                6
#>  8           3 1950          <NA>      <NA>      <NA>      <NA>               55
#>  9           3 1958          <NA>      <NA>      <NA>      <NA>               68
#> 10           3 1982          <NA>      <NA>      <NA>      <NA>               19
#> # ℹ more rows

DBI::dbDisconnect(con)

Created on 2024-09-09 with reprex v2.1.1

ablack3 commented 1 month ago

Closing this issue since SQLite will not be supported. Duckdb is supported instead. I'll add an error if someone tries to use sqlite with cdmFromCon in the next release.