darwin-eu-dev / omopgenerics

https://darwin-eu-dev.github.io/omopgenerics/
Apache License 2.0
1 stars 1 forks source link

warning in bind results - repeated group identifiers #398

Closed edward-burn closed 3 days ago

edward-burn commented 1 month ago

@catalamarti not sure if this is for omopgenerics or CohortCharacteristics, but I see the below warning

library(CohortCharacteristics)

cdm <- mockCohortCharacteristics()
results <- list()

results[["cohort_counts"]] <- cdm$cohort1 |>
  CohortCharacteristics::summariseCohortCount()
#> ℹ summarising data
#> ✔ summariseCharacteristics finished!
omopgenerics::newSummarisedResult(results[["cohort_counts"]])|> 
  dplyr::glimpse()
#> Rows: 6
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1
#> $ cdm_name         <chr> "PP_MOCK", "PP_MOCK", "PP_MOCK", "PP_MOCK", "PP_MOCK"…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort_1", "cohort_2", "cohort_3", "cohort_1", "coho…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Number records", "Number records", "Number records",…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count"
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "5", "2", "3", "5", "2", "3"
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

results[["cohort_summary"]] <- cdm$cohort1 %>%
  CohortCharacteristics::summariseCharacteristics()
#> ℹ adding demographics columns
#> ℹ summarising data
#> ✔ summariseCharacteristics finished!
omopgenerics::newSummarisedResult(results[["cohort_summary"]]) |> 
  dplyr::glimpse()
#> Rows: 111
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "PP_MOCK", "PP_MOCK", "PP_MOCK", "PP_MOCK", "PP_MOCK"…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort_1", "cohort_3", "cohort_2", "cohort_1", "coho…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Number records", "Number records", "Number records",…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "5", "3", "2", "5", "3", "2", "1920-07-09", "1984-12-…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

omopgenerics::bind(results) |> 
  omopgenerics::newSummarisedResult()
#> Error in `checkGroupCount()`:
#> ! Each groupping (unique combination of: result_id, cdm_name,
#>   group_name, group_level, strata_name, and strata_level) can not contain
#>   repeated group identifiers (number subjects and number records).
#> First 5 combinations:
#> • 2 'number subjects' in variable_name for: result_id: 1, cdm_name: PP_MOCK,
#>   group_name: cohort_name, group_level: cohort_1, strata_name: overall,
#>   strata_level: overall.
#> • 2 'number subjects' in variable_name for: result_id: 1, cdm_name: PP_MOCK,
#>   group_name: cohort_name, group_level: cohort_2, strata_name: overall,
#>   strata_level: overall.
#> • 2 'number subjects' in variable_name for: result_id: 1, cdm_name: PP_MOCK,
#>   group_name: cohort_name, group_level: cohort_3, strata_name: overall,
#>   strata_level: overall.
#> • 2 'number records' in variable_name for: result_id: 1, cdm_name: PP_MOCK,
#>   group_name: cohort_name, group_level: cohort_1, strata_name: overall,
#>   strata_level: overall.
#> • 2 'number records' in variable_name for: result_id: 1, cdm_name: PP_MOCK,
#>   group_name: cohort_name, group_level: cohort_2, strata_name: overall,
#>   strata_level: overall.
#> Backtrace:
#>      ▆
#>   1. ├─omopgenerics::newSummarisedResult(omopgenerics::bind(results))
#>   2. │ └─omopgenerics:::assertClass(x = x, class = "data.frame")
#>   3. │   └─omopgenerics:::assertNull(x, nm, null, msg, call)
#>   4. ├─omopgenerics::bind(results)
#>   5. └─omopgenerics:::bind.list(results)
#>   6.   ├─base::do.call(bind, ...)
#>   7.   ├─omopgenerics (local) `<fn>`(cohort_counts = `<smmrsd_r[,13]>`, cohort_summary = `<smmrsd_r[,13]>`)
#>   8.   └─omopgenerics:::bind.summarised_result(...)
#>   9.     └─omopgenerics::newSummarisedResult(...)
#>  10.       └─omopgenerics:::validateSummariseResult(x)
#>  11.         └─omopgenerics:::checkGroupCount(x)
#>  12.           └─cli::cli_abort(res)
#>  13.             └─rlang::abort(...)

Created on 2024-07-13 with reprex v2.0.2

catalamarti commented 1 month ago

So the problem is that summariseCohortCount is a subset of summariseCharacteristics and when it tries to bind the results it detects that they have same settings so try to put it together. For me the solution should be in CohortCharacteristics and summariseCohortCount and summariseCohorCharacteristics should have different settings (at least different result_type). Happy to discuss @edward-burn

catalamarti commented 3 days ago

this was fixed adding a distinct in bind.summarise_result