OHDSI / circe-be

CIRCE is a cohort definition and syntax compiler tool for OMOP CDMv5
Apache License 2.0
9 stars 13 forks source link

Temporary cohort and stats tables? #187

Closed ablack3 closed 1 year ago

ablack3 commented 1 year ago

Is it within scope to use temporary tables for cohort and stats tables?

Currently we would need a bit of a hack like

library(SqlRender)
sql <- "DELETE FROM @target_database_schema.@target_cohort_table"
render(sql, target_database_schema. = "", target_cohort_table = '#temp_cohort')
#> [1] "DELETE FROM #temp_cohort"

tagging @edward-burn @Tsemharb

chrisknoll commented 1 year ago

So, this is a similar function that exists in CohortIncidence.

I added temp table mode in this PR., but it also included functionality for strata settings. But if you see what executeAnalysis is doing:

  1. Forces useTempTables to TRUE
  2. Gets the resultsDDL in temp table mode (ie: create table changes to create temp table)
  3. Execute the DDL to create the temp tables
  4. get the analysis sql in temp table mode (results will write to temp tables)
  5. While connection is still open, we will query results via: select * from #incidence_summary
  6. Execute cleanupSQL to remove the results temp tables (just to be a good citizen)
  7. Close connection

A generic flow is:

  1. Create result DDL as temp tables
  2. Execute the query as normal but specify settings to make INSERTS into the temp tables (like @ablack3 describes)
  3. Do something with the temp tables while they live (download to CSV or move them to another perm table)
  4. Drop temp tables
  5. Close connection

So, if we do something like this in CIRCE, people need to realize that it's not a simple 'build sql' anymore. There's multiple steps: fetch the results schema DDL as temp tables, replace params to write to temp tables, fetch results from temp tables, clean up temp tables. This library will need functions to support these operations.

Also, in CohortIncidence, it's all or nothing: you can't say some of the tables should be temporary, but others should be permanent . I bring this point up because your request is to have stats tables go into a temporary space, but I assume you mean to leave the cohort table result as a permanent result. To deal with this, we'd still have the results schema all temp, but then you'd have to make a final post-analysis step to copy from the temp results table into a permanent table before dropping the temp tables.

Update: I just read your update to the title, and you do mean to make cohort and stats table temporary.

ablack3 commented 1 year ago

Thanks @chrisknoll! Very helpful info. This is a request from @edward-burn so I'll let him add context. My understanding is that we would have the option to use temp tables for all cohort and stats tables. It seems like this possible using SQLRender without any changes to Circe but I wanted to check that this is ok to do and if it should be an option when creating the cohort generation sql.

chrisknoll commented 1 year ago

Yes, you can do it by using sqlrender paramaterization, just ensure you're hitting all the right tables (the cohort table is only one of several). It is what would be required if we wanted a 'temp table' mode of the sql generation.