Open edward-burn opened 4 weeks ago
I would be hesitant to print it in the attrition because you can not create a cohort with records not in observation
But for IncidencePrevalence it's been useful in the past to know more about how the inital cohort was created to pick up potential etl problems etc on the data partner side. Like for me this would be a nice way of prompting the data partner to realise the importance of having records within observation
library(IncidencePrevalence)
cdm <- mockIncidencePrevalenceRef()
cdm <- generateDenominatorCohortSet(cdm, "denom")
#> Loading required namespace: testthat
#> ℹ Creating denominator cohorts
#> ✔ Cohorts created in 0 min and 2 sec
attrition(cdm$denom) |>
dplyr::glimpse()
#> Rows: 8
#> Columns: 7
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1
#> $ number_records <int> 1, 1, 1, 1, 1, 1, 1, 1
#> $ number_subjects <int> 1, 1, 1, 1, 1, 1, 1, 1
#> $ reason_id <int> 1, 2, 3, 4, 5, 6, 7, 10
#> $ reason <chr> "Starting population", "Missing year of birth", "…
#> $ excluded_records <int> NA, 0, 0, 0, 0, 0, 0, 0
#> $ excluded_subjects <int> NA, 0, 0, 0, 0, 0, 0, 0
Created on 2024-06-03 with reprex v2.1.0
So that the cohort table satisfies omop cdm cohort requirements we drop records that start outside of an observation period. It would be nice, I think, if we printed a cli message telling the user how many records were dropped for this reason - maybe with both the n and % of total records (as the latter is likely a good indicator of the size of the impact).
We could also include this in the attrition attribute