darwin-eu / CDMConnector

A pipe friendly way to interact with an OMOP Common Data Model
https://darwin-eu.github.io/CDMConnector/
Apache License 2.0
12 stars 10 forks source link

Question: How to add a database table to the cdm reference #25

Closed eric-fey-hus closed 2 months ago

eric-fey-hus commented 2 months ago

Hi, this is a question. Hope you can help. I have a cdm object. In the source (database) I have a table mytable under schema omopresults. With listTables(cdm) I can see it. But cdm$mytable does not work. how can I add it such that I can use it like this?

vajnie commented 2 months ago

Hi Eric,

You need to insert the table to the cdm. This is a script I use to load cohort tables that are not in the cohort_definition, complete with attributes. Its not the cleanest but works.

# READ A COHORT IN AND SET IT UP
# https://cran.r-project.org/web/packages/CDMConnector/vignettes/a02_cohorts.html see examples
#@name: the name, without .csv. this is the name of the file and the name with which its inserted to cdm
# ----------------------------------------------------------------------------------------------------

library(stringr)

# Choose a file and get its basename without .csv
path = file.choose()
new_cohort_name = str_replace(basename(path), ".csv", "") %>% str_replace(pattern = "cohort_", "")

# Make a cohort_definition_id that comes from the filename as a sum of its name ASCII numbers.                      
new_cohort_definition_id <- sum(utf8ToInt(new_cohort_name))

loaded_cohort_name = "loaded_cohort" # just a temp name we use for a bit before changing it.

loaded_table <- read_csv(path, 
                         col_types = cols(...1 = col_skip()))

# Insert the table to be a cohort table in cdm object as a temp table, so we can get the attrition and cohortRef from it to properly load 
cdm <- insertTable(cdm = cdm, name = loaded_cohort_name,  table = loaded_table, overwrite = TRUE) # Add table to CDM object
cdm[[loaded_cohort_name]] <- newCohortTable(cdm[[loaded_cohort_name]]) # make it a cohortTable to get attrition and so out 

# Get temporary attrition and cohortSetRef info 
temp_attrition <- attrition(cdm[[loaded_cohort_name]]) %>% mutate(cohort_definition_id = new_cohort_definition_id) %>% compute() %>%  mutate(cohort_name = cohort_name) # %>% compute()
temp_cohortSetRef <- (cdm[[loaded_cohort_name]]) %>% mutate(cohort_definition_id = new_cohort_definition_id) %>% mutate(cohort_name = new_cohort_name) # %>% compute(name = new_cohort_name, temporary = FALSE, overwrite = TRUE)

# Change the cohort definition id and the cohort name. 
# Important: new_generated_cohort_set is required for the settings(your_cohort_table) to be updated. 
cdm[[new_cohort_name]] <- cdm[[loaded_cohort_name]] %>% mutate(cohort_definition_id = new_cohort_definition_id) %>% compute(name = new_cohort_name, temporary = FALSE, overwrite = TRUE) %>% new_generated_cohort_set()
eric-fey-hus commented 2 months ago

@vajnie Thanks for the detailed instructions :-). My problem was that the table was already in the schema on the db, just not referenced in the cdm object. And i di not want to recreate/load the table again (took too long in this case). Found a quick and easy way to do that. See here https://github.com/oxford-pharmacoepi/MegaStudy/issues/79#issuecomment-2361791736 also copied below.

cdm <- cdmFomCon(
  con = db,
  cdmSchema = c(schema = cdmSchema),
  writeSchema = c(schema = writeSchema, prefix = writePrefix),
  cohortTables = c('covid', 'neutropenia', [...], 
                   'inc_pat'),
  cdmName = dbName
)

The code requires that the mentioned cohort-tables are already in the db.