OHDSI / ClinicalCharacteristics

[under development] table shell approach to OMOP characterization
Apache License 2.0
0 stars 0 forks source link

Bug when patient doesn't qualify for any categories of a variable with "breaks" #15

Closed katy-sadowski closed 2 months ago

katy-sadowski commented 5 months ago

If a patient exists in the cohort which does not qualify for a category in a characteristic with "breaks" (for example - a patient aged 20 when the only categories in an age characteristic are 0-10 and 11-19), ClinChar throws an error.

This is because cohort data is left joined to the breaks data: https://github.com/OHDSI/ClinicalCharacteristics/blob/6ac4deaf485a9cfb1f5bd3e52f01f47ed4dc2e20/R/conversion.R#L247-L266

But the final data table has a NOT NULL constraint on all columns: https://github.com/OHDSI/ClinicalCharacteristics/blob/6ac4deaf485a9cfb1f5bd3e52f01f47ed4dc2e20/R/clinChar.R#L88-L89

So the script blows up when a row is missing value_id

mdlavallee92 commented 5 months ago

So maybe a full join and collapse missing as an other group? trying to think of a way to resolve this

katy-sadowski commented 5 months ago

If we want the full picture of missing data / empty categories, maybe we could:

  1. Cross join cohort IDs and break IDs
  2. Left join breaks table to that on break ID
  3. Full join cohort table to that on value
  4. Collapse cohort rows missing a category into 1 row per cohort; collapse break rows missing a subject into 1 row per cohort

Not sure if that's overkill though. I think it would also be fine to just inner join instead of left join and know we will only be representing the overlap of cohorts<>breaks in the output. Pretty sure this is what the ATLAS Characterization exports do.