OHDSI / CohortConstructor

https://ohdsi.github.io/CohortConstructor/
Apache License 2.0
1 stars 0 forks source link

Discuss matchCohorts() when there is more than one record per person within a cohort #204

Open martaalcalde opened 5 months ago

martaalcalde commented 5 months ago

Currently, matchCohorts() assumes that there is only one record per person within a cohort. Hence, if that assumption is not fulfilled, each record would be treated independently and therefore, the same person will have different matches. See example below for a better understanding:

library(CohortConstructor)
library(dplyr)

cdm <- mockCohortConstructor(nPerson = 1000, seed = 0)
cdm$cohort1 |>
  dplyr::filter(subject_id == 3) |>
  matchCohorts(name = "new_cohort")
#> Starting matching
#> ℹ Creating copy of target cohort.
#> • 1 cohort to be matched.
#> ℹ Creating controls cohorts.
#> ℹ Excluding cases from controls
#> • Matching by gender_concept_id and year_of_birth
#> • Removing controls that were not in observation at index date
#> • Excluding target records whose pair is not in observation
#> • Adjusting ratio
#> Binding both cohorts
#> ✔ Done
#> # Source:   table<main.new_cohort> [4 x 5]
#> # Database: DuckDB v0.10.1 [root@Darwin 23.5.0:R 4.3.2/:memory:]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date cluster_id
#>                  <int>      <int> <date>            <date>               <dbl>
#> 1                    1          3 2015-03-20        2015-03-29               1
#> 2                    1          3 2015-03-30        2015-04-05               2
#> 3                    2        147 2015-03-20        2016-01-10               1
#> 4                    2        509 2015-03-30        2017-05-20               2

Would be nice to discuss if this is what we should expect or we should throw an error when a person is repeated in a cohort. Currently, I've implemented the following warning message:

Warning: Multiple records per person detected. The matchCohorts() function is designed to operate under the assumption that there is only one record per person within each cohort. If this assumption is not met, each record will be treated independently. As a result, the same individual may be matched multiple times, leading to inconsistent and potentially misleading results.

@edward-burn @catalamarti

Created on 2024-06-05 with reprex v2.1.0

edward-burn commented 5 months ago

Thanks for spotting this @martaalcalde. For the open pr let's indeed just put this warning, but then let's discuss this