darwin-eu-dev / omopgenerics

https://darwin-eu-dev.github.io/omopgenerics/
Apache License 2.0
2 stars 1 forks source link

Implement suppress() for cohort attrition #330

Closed ablack3 closed 5 months ago

ablack3 commented 5 months ago

Iqvia has a requirement that no cells in the cohort attrition table that are less than 5 can be reported. Additionally it should be impossible to deduce the value of a suppressed cell from the values of the other cells. I'm not sure if anyone else has encountered this but on the polypharmacy study we had to do some custom suppression of the cohort_attrition table to meet this requirement.

The logic we implemented was "If any cell in the cohort attrition for a cohort was between 1 and 4 (inclusive) we suppress all values for that cohort except the final count."

here is a reprex with a possible implementation. It is possible to supress less information by combining categories but this implementation was quick and easy so it is what we used for the study.

tagging @JTBrash

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

cohort_attrition <- tibble::tribble(
           ~cdm_name, ~cohort_definition_id, ~cohort_name, ~number_records, ~number_subjects, ~reason_id,                                    ~reason, ~excluded_records, ~excluded_subjects,
              "ipci",                    1L,        "aml",            249L,             249L,         1L,               "Qualifying initial records",                0L,                 0L,
              "ipci",                    1L,        "aml",            196L,             196L,         2L, "no prior cancer (exc. non-melanoma skin)",               53L,                53L,
              "ipci",                    1L,        "aml",            193L,             193L,         3L,                                "Age >= 18",                3L,                 3L,
              "ipci",                    8L,   "leukemia",           1278L,            1278L,         1L,               "Qualifying initial records",                0L,                 0L,
              "ipci",                    8L,   "leukemia",           1079L,            1079L,         2L, "no prior cancer (exc. non-melanoma skin)",              199L,               199L,
              "ipci",                    8L,   "leukemia",           1029L,            1029L,         3L,                                "Age >= 18",               50L,                50L,
   "cdm_gold_202307",                    1L,        "aml",            736L,             736L,         1L,               "Qualifying initial records",                0L,                 0L,
   "cdm_gold_202307",                    1L,        "aml",            531L,             531L,         2L, "no prior cancer (exc. non-melanoma skin)",              205L,               205L,
   "cdm_gold_202307",                    1L,        "aml",            517L,             517L,         3L,                                "Age >= 18",               14L,                14L,
   "cdm_gold_202307",                    8L,   "leukemia",           2886L,            2886L,         1L,               "Qualifying initial records",                0L,                 0L,
   "cdm_gold_202307",                    8L,   "leukemia",           2498L,            2498L,         2L, "no prior cancer (exc. non-melanoma skin)",              388L,               388L,
   "cdm_gold_202307",                    8L,   "leukemia",           2336L,            2336L,         3L,                                "Age >= 18",              162L,               162L,
            "sidiap",                    1L,        "aml",           2270L,            2270L,         1L,               "Qualifying initial records",                0L,                 0L,
            "sidiap",                    1L,        "aml",           1011L,            1011L,         2L, "no prior cancer (exc. non-melanoma skin)",             1259L,              1259L,
            "sidiap",                    1L,        "aml",            988L,             988L,         3L,                                "Age >= 18",               23L,                23L,
            "sidiap",                    8L,   "leukemia",           7686L,            7686L,         1L,               "Qualifying initial records",                0L,                 0L,
            "sidiap",                    8L,   "leukemia",           5313L,            5313L,         2L, "no prior cancer (exc. non-melanoma skin)",             2373L,              2373L,
            "sidiap",                    8L,   "leukemia",           5108L,            5108L,         3L,                                "Age >= 18",              205L,               205L,
               "ebb",                    1L,        "aml",             55L,              55L,         1L,               "Qualifying initial records",                0L,                 0L,
               "ebb",                    1L,        "aml",             52L,              52L,         2L, "no prior cancer (exc. non-melanoma skin)",                3L,                 3L,
               "ebb",                    1L,        "aml",             52L,              52L,         3L,                                "Age >= 18",                0L,                 0L,
               "ebb",                    8L,   "leukemia",            287L,             287L,         1L,               "Qualifying initial records",                0L,                 0L,
               "ebb",                    8L,   "leukemia",            202L,             202L,         2L, "no prior cancer (exc. non-melanoma skin)",               85L,                85L,
               "ebb",                    8L,   "leukemia",            202L,             202L,         3L,                                "Age >= 18",                0L,                 0L,
  "iqvia_germany_da",                    1L,        "aml",           1294L,            1294L,         1L,               "Qualifying initial records",                0L,                 0L,
  "iqvia_germany_da",                    1L,        "aml",            822L,             822L,         2L, "no prior cancer (exc. non-melanoma skin)",              472L,               472L,
  "iqvia_germany_da",                    1L,        "aml",            803L,             803L,         3L,                                "Age >= 18",               19L,                19L,
  "iqvia_germany_da",                    8L,   "leukemia",           9661L,            9661L,         1L,               "Qualifying initial records",                0L,                 0L,
  "iqvia_germany_da",                    8L,   "leukemia",           7735L,            7735L,         2L, "no prior cancer (exc. non-melanoma skin)",             1926L,              1926L,
  "iqvia_germany_da",                    8L,   "leukemia",           7345L,            7345L,         3L,                                "Age >= 18",              390L,               390L
  )

supress_cohort <- function(cohort_attrition, min_cell_count = 5) {
  cohort_attrition %>% 
    group_by(cohort_name, cdm_name) %>% 
    mutate(
      max_reason_id = max(reason_id),
      censor_group = ifelse(
        dplyr::between(excluded_subjects, 1, min_cell_count-1) | 
          dplyr::between(excluded_records, 1, min_cell_count-1) | 
          dplyr::between(number_records, 1, min_cell_count-1) |
          dplyr::between(number_subjects, 1, min_cell_count-1), T, F)) %>% 
    mutate(censor_group = max(censor_group, na.rm = T)) %>% 
    ungroup() %>% 
    mutate(
      number_records    = ifelse(censor_group == 1 & reason_id < max_reason_id, NA_real_, number_records), 
      number_subjects   = ifelse(censor_group == 1  & reason_id < max_reason_id, NA_real_, number_subjects), 
      excluded_records  = ifelse(censor_group == 1 , NA_real_, excluded_records), 
      excluded_subjects = ifelse(censor_group == 1 , NA_real_, excluded_subjects)) %>% 
    select(-max_reason_id, -censor_group)
}

# before censoring
cohort_attrition %>% 
  select(-cohort_definition_id) %>% 
  rename(n_rec = number_records, n_sub = number_subjects, 
         ex_rec = excluded_records, ex_subj = excluded_subjects) %>% 
  print(n=1e6)
#> # A tibble: 30 × 8
#>    cdm_name         cohort_name n_rec n_sub reason_id reason      ex_rec ex_subj
#>    <chr>            <chr>       <int> <int>     <int> <chr>        <int>   <int>
#>  1 ipci             aml           249   249         1 Qualifying…      0       0
#>  2 ipci             aml           196   196         2 no prior c…     53      53
#>  3 ipci             aml           193   193         3 Age >= 18        3       3
#>  4 ipci             leukemia     1278  1278         1 Qualifying…      0       0
#>  5 ipci             leukemia     1079  1079         2 no prior c…    199     199
#>  6 ipci             leukemia     1029  1029         3 Age >= 18       50      50
#>  7 cdm_gold_202307  aml           736   736         1 Qualifying…      0       0
#>  8 cdm_gold_202307  aml           531   531         2 no prior c…    205     205
#>  9 cdm_gold_202307  aml           517   517         3 Age >= 18       14      14
#> 10 cdm_gold_202307  leukemia     2886  2886         1 Qualifying…      0       0
#> 11 cdm_gold_202307  leukemia     2498  2498         2 no prior c…    388     388
#> 12 cdm_gold_202307  leukemia     2336  2336         3 Age >= 18      162     162
#> 13 sidiap           aml          2270  2270         1 Qualifying…      0       0
#> 14 sidiap           aml          1011  1011         2 no prior c…   1259    1259
#> 15 sidiap           aml           988   988         3 Age >= 18       23      23
#> 16 sidiap           leukemia     7686  7686         1 Qualifying…      0       0
#> 17 sidiap           leukemia     5313  5313         2 no prior c…   2373    2373
#> 18 sidiap           leukemia     5108  5108         3 Age >= 18      205     205
#> 19 ebb              aml            55    55         1 Qualifying…      0       0
#> 20 ebb              aml            52    52         2 no prior c…      3       3
#> 21 ebb              aml            52    52         3 Age >= 18        0       0
#> 22 ebb              leukemia      287   287         1 Qualifying…      0       0
#> 23 ebb              leukemia      202   202         2 no prior c…     85      85
#> 24 ebb              leukemia      202   202         3 Age >= 18        0       0
#> 25 iqvia_germany_da aml          1294  1294         1 Qualifying…      0       0
#> 26 iqvia_germany_da aml           822   822         2 no prior c…    472     472
#> 27 iqvia_germany_da aml           803   803         3 Age >= 18       19      19
#> 28 iqvia_germany_da leukemia     9661  9661         1 Qualifying…      0       0
#> 29 iqvia_germany_da leukemia     7735  7735         2 no prior c…   1926    1926
#> 30 iqvia_germany_da leukemia     7345  7345         3 Age >= 18      390     390

# after censoring
supress_cohort(cohort_attrition, 5) %>% 
  select(-cohort_definition_id) %>% 
  rename(n_rec = number_records, n_sub = number_subjects, 
         ex_rec = excluded_records, ex_subj = excluded_subjects) %>% 
  print(n=1e6)
#> # A tibble: 30 × 8
#>    cdm_name         cohort_name n_rec n_sub reason_id reason      ex_rec ex_subj
#>    <chr>            <chr>       <dbl> <dbl>     <int> <chr>        <dbl>   <dbl>
#>  1 ipci             aml            NA    NA         1 Qualifying…     NA      NA
#>  2 ipci             aml            NA    NA         2 no prior c…     NA      NA
#>  3 ipci             aml           193   193         3 Age >= 18       NA      NA
#>  4 ipci             leukemia     1278  1278         1 Qualifying…      0       0
#>  5 ipci             leukemia     1079  1079         2 no prior c…    199     199
#>  6 ipci             leukemia     1029  1029         3 Age >= 18       50      50
#>  7 cdm_gold_202307  aml           736   736         1 Qualifying…      0       0
#>  8 cdm_gold_202307  aml           531   531         2 no prior c…    205     205
#>  9 cdm_gold_202307  aml           517   517         3 Age >= 18       14      14
#> 10 cdm_gold_202307  leukemia     2886  2886         1 Qualifying…      0       0
#> 11 cdm_gold_202307  leukemia     2498  2498         2 no prior c…    388     388
#> 12 cdm_gold_202307  leukemia     2336  2336         3 Age >= 18      162     162
#> 13 sidiap           aml          2270  2270         1 Qualifying…      0       0
#> 14 sidiap           aml          1011  1011         2 no prior c…   1259    1259
#> 15 sidiap           aml           988   988         3 Age >= 18       23      23
#> 16 sidiap           leukemia     7686  7686         1 Qualifying…      0       0
#> 17 sidiap           leukemia     5313  5313         2 no prior c…   2373    2373
#> 18 sidiap           leukemia     5108  5108         3 Age >= 18      205     205
#> 19 ebb              aml            NA    NA         1 Qualifying…     NA      NA
#> 20 ebb              aml            NA    NA         2 no prior c…     NA      NA
#> 21 ebb              aml            52    52         3 Age >= 18       NA      NA
#> 22 ebb              leukemia      287   287         1 Qualifying…      0       0
#> 23 ebb              leukemia      202   202         2 no prior c…     85      85
#> 24 ebb              leukemia      202   202         3 Age >= 18        0       0
#> 25 iqvia_germany_da aml          1294  1294         1 Qualifying…      0       0
#> 26 iqvia_germany_da aml           822   822         2 no prior c…    472     472
#> 27 iqvia_germany_da aml           803   803         3 Age >= 18       19      19
#> 28 iqvia_germany_da leukemia     9661  9661         1 Qualifying…      0       0
#> 29 iqvia_germany_da leukemia     7735  7735         2 no prior c…   1926    1926
#> 30 iqvia_germany_da leukemia     7345  7345         3 Age >= 18      390     390

Created on 2024-05-16 with reprex v2.1.0

catalamarti commented 5 months ago

hi @ablack3 if you use: summariseCohortAttrition(cohort) or summary(cohort) this should provide a summarised_result object with the attrition and then can be suppressed using the suppress method. Is this not working for you?

ablack3 commented 5 months ago

Oh I did not know that was possible. I was using cohortAttrition(cohort) I'll try it.

catalamarti commented 5 months ago

there are functions for tables and plots too: tableCohortAttrition and plotCohortAttrition (although the first one is still experimental)