darwin-eu-dev / PatientProfiles

https://darwin-eu-dev.github.io/PatientProfiles/
Apache License 2.0
6 stars 5 forks source link

addConceptIntersectFlag will not consider records where end date is missing #664

Closed ablack3 closed 2 months ago

ablack3 commented 3 months ago

Suppose I want to add flags for a condition concept to my cohort table but some of those condition records have missing end dates. addConceptIntersectFlag will not consider these records. This behavior changed in version 1.0.0 and had a significant effect on study results. In PatientProfiles 0.6 records with missing end dates were considered when adding flags.

I'm not sure if this was a conscious decision or is a bug. For my study I think we did want to include records where end dates were missing.

Here is a reprex.

library(CDMConnector)
library(dplyr, warn.conflicts = F)
con <- DBI::dbConnect(duckdb::duckdb(), eunomia_dir())

cohort <- dplyr::tibble(
  cohort_definition_id = 1,
  subject_id = 273L,
  cohort_start_date = as.Date("2012-10-10"),
  cohort_end_date = as.Date("2013-10-10")
)

DBI::dbWriteTable(con, "cohort", cohort)

cdm <- cdm_from_con(con, 
                    cdm_schema = "main", 
                    write_schema = "main", 
                    cohort_tables = "cohort") %>% 
  cdm_subset(person_id = 273L)

# notice the missing condition end date
cdm$condition_occurrence %>% 
  select(2:4, 6)
#> # Source:   SQL [?? x 4]
#> # Database: DuckDB v0.10.1 [root@Darwin 23.0.0:R 4.3.1//private/var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T/RtmpA3wsEn/file17d748f3e4df.duckdb]
#>    person_id condition_concept_id condition_start_date condition_end_date
#>        <int>                <int> <date>               <date>            
#>  1       273               192671 2011-10-10           NA                
#>  2       273             40481087 1996-07-06           1996-07-27        
#>  3       273             40481087 2002-08-02           2002-08-09        
#>  4       273              4112343 1983-09-19           1983-10-01        
#>  5       273               260139 2014-03-27           2014-04-10        
#>  6       273              4230399 2001-09-27           2001-11-26        
#>  7       273               133834 1980-03-28           1993-01-19        
#>  8       273              4112343 2005-09-26           2005-10-09        
#>  9       273             40481087 2014-11-03           2014-11-17        
#> 10       273                28060 1976-02-14           1976-02-27        
#> # ℹ more rows

# check that we do have observation time covering the condition occurrence
cdm$cohort %>% 
  PatientProfiles::addDemographics() %>% 
  select(matches("start_date|obs"))
#> # Source:   SQL [1 x 3]
#> # Database: DuckDB v0.10.1 [root@Darwin 23.0.0:R 4.3.1//private/var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T/RtmpA3wsEn/file17d748f3e4df.duckdb]
#>   cohort_start_date prior_observation future_observation
#>   <date>                        <int>              <int>
#> 1 2012-10-10                    13541               2329

cdm$cohort %>% 
  PatientProfiles::addConceptIntersectFlag(
    conceptSet = list(a = 192671L),
    window = c(-Inf, 0)
  )
#> # Source:   table<og_014_1716476059> [1 x 5]
#> # Database: DuckDB v0.10.1 [root@Darwin 23.0.0:R 4.3.1//private/var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T/RtmpA3wsEn/file17d748f3e4df.duckdb]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date a_minf_to_0
#>                  <dbl>      <int> <date>            <date>                <dbl>
#> 1                    1        273 2012-10-10        2013-10-10                0

# fill in end dates
cdm$condition_occurrence <- cdm$condition_occurrence %>% 
  mutate(condition_end_date = coalesce(condition_end_date, condition_start_date))

cdm$cohort %>% 
  PatientProfiles::addConceptIntersectFlag(
    conceptSet = list(a = 192671L),
    window = c(-Inf, 0)
  )
#> # Source:   table<og_025_1716476060> [1 x 5]
#> # Database: DuckDB v0.10.1 [root@Darwin 23.0.0:R 4.3.1//private/var/folders/xx/01v98b6546ldnm1rg1_bvk000000gn/T/RtmpA3wsEn/file17d748f3e4df.duckdb]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date a_minf_to_0
#>                  <dbl>      <int> <date>            <date>                <dbl>
#> 1                    1        273 2012-10-10        2013-10-10                1

DBI::dbDisconnect(con, shutdown = T)

Created on 2024-05-23 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os macOS Sonoma 14.0 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/Amsterdam #> date 2024-05-23 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0) #> blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.0) #> CDMConnector * 1.4.0 2024-05-02 [1] local #> checkmate 2.3.1 2023-12-04 [1] CRAN (R 4.3.1) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> DBI 1.2.2 2024-02-16 [1] CRAN (R 4.3.1) #> dbplyr 2.5.0 2024-03-19 [1] CRAN (R 4.3.1) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.1) #> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1) #> duckdb 0.10.1 2024-04-02 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.1) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> omopgenerics 0.2.0 2024-04-30 [1] CRAN (R 4.3.1) #> PatientProfiles 1.0.0 2024-05-16 [1] CRAN (R 4.3.3) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.1) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1) #> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.1) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> snakecase 0.11.1 2023-08-27 [1] CRAN (R 4.3.0) #> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.1) #> stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.1) #> styler 1.10.3 2024-04-07 [1] CRAN (R 4.3.1) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.1) #> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.1) #> xfun 0.43 2024-03-25 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
ablack3 commented 3 months ago

@catalamarti if you think this function should be picking up records with missing end dates I can make a PR to make this change. Would be good for me to try contributing to this package I think. My proposal would be to use

end_date = coalesce(end_date, start_date) 

on the overlap_table in addIntersect. This way all records would have end dates.

catalamarti commented 2 months ago

665