darwin-eu / CDMConnector

A pipe friendly way to interact with an OMOP Common Data Model
https://darwin-eu.github.io/CDMConnector/
Apache License 2.0
12 stars 10 forks source link

cdm_flatten() on a subset can give incorrect "table is empty" errors #20

Open SulevR opened 6 months ago

SulevR commented 6 months ago

Using cdm_flatten() on a CDM performs table checks for the domains, for example:

assert_tables(cdm, "drugs")

and gives an error if the table is empty:

Error in `assert_tables()`:
! - drug_exposure cdm table is empty

However, when the function is applied to a small subset, it can easily happen that for this particular subset, some table is empty and this is completely OK and should not produce an error.

Reproducible example:

Sys.setenv('EUNOMIA_DATA_FOLDER'=file.path(getwd(),'eunomia'))
cdm_andmebaasi_faili_path=eunomia_dir("GiBleed")
con <- DBI::dbConnect(duckdb::duckdb(), cdm_andmebaasi_faili_path)
cdmTest <- cdm_from_con(con, cdm_name = "eunomia", cdm_schema = "main", write_schema = "main")

# test:
subset=cdmTest %>%
  cdm_subset(person_id = 67)

b=cdm_flatten(subset)

EXPECTED RESULT: No error

ACTUAL RESULT:

Error in `assert_tables()`:
! - procedure_occurrence cdm table is empty
ablack3 commented 4 weeks ago

Thank you for this example and repex. I agree it should be fixed and have added it to our backlog.