JuliaHealth / OMOPCDMCohortCreator.jl

Create cohorts from databases utilizing the OMOP CDM
https://juliahealth.org/OMOPCDMCohortCreator.jl/stable
Other
9 stars 7 forks source link

[FEATURE] Make a Better Default for `GetPatientAgeGroup` #73

Closed TheCedarPrince closed 7 months ago

TheCedarPrince commented 7 months ago

I've been working with GetPatientAgeGroup and realized: I don't like the default I built-in here. Basically, this piece of code in the core functionality:

    age_arr = []

    for grp in age_groupings
        push!(age_arr, Get.age .< grp[2] + 1)
        push!(age_arr, "$(grp[1]) - $(grp[2])")
    end

https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl/blob/ac5bcf02a67ff0afe7868c6d77b61674daa2795e/src/getters.jl#L596

Should have one more value added to the age_arr and what it should be is a kind of "finalizer" where, if someone does not fit into the specified age buckets, they get placed into here. I think this finalized label could be called something like "Unspecified". It would entail creating a final case within the FunSQL block that would act as the generic ELSE clause: https://www.w3schools.com/sql/sql_case.asp

I don't have a quick way to implement this in my mind, but would love some help here! Happy to brainstorm. :)

TheCedarPrince commented 7 months ago

More explicitly, the way the FunSQL code renders the existing defaults is like this:

SELECT
  "person_2"."person_id",
  (CASE WHEN ("person_2"."age" < 10) THEN '0 - 9' WHEN ("person_2"."age" < 20) THEN '10 - 19' WHEN ("person_2"."age" < 30) THEN '20 - 29' WHEN ("person_2"."age" < 40) THEN '30 - 39' WHEN ("person_2"."age" < 50) THEN '40 - 49' WHEN ("person_2"."age" < 60) THEN '50 - 59' WHEN ("person_2"."age" < 70) THEN '60 - 69' WHEN ("person_2"."age" < 80) THEN '70 - 79' WHEN ("person_2"."age" < 90) THEN '80 - 89' END) AS "age_group"
FROM (
  SELECT
    "person_1"."person_id",
    (2024 - "person_1"."year_of_birth") AS "age"
  FROM "omop"."person" AS "person_1"
  WHERE ("person_1"."person_id" IN (1))
) AS "person_2"

Ideally, there should be an additional part in the case blog that says something like this:

SELECT
  "person_2"."person_id",
  (CASE WHEN ("person_2"."age" < 10) THEN '0 - 9' WHEN ("person_2"."age" < 20) THEN '10 - 19' WHEN ("person_2"."age" < 30) THEN '20 - 29' WHEN ("person_2"."age" < 40) THEN '30 - 39' WHEN ("person_2"."age" < 50) THEN '40 - 49' WHEN ("person_2"."age" < 60) THEN '50 - 59' WHEN ("person_2"."age" < 70) THEN '60 - 69' WHEN ("person_2"."age" < 80) THEN '70 - 79' WHEN ("person_2"."age" < 90) THEN '80 - 89' ELSE 'Unspecified' END) AS "age_group"
FROM (
  SELECT
    "person_1"."person_id",
    (2024 - "person_1"."year_of_birth") AS "age"
  FROM "omop"."person" AS "person_1"
  WHERE ("person_1"."person_id" IN (1))
) AS "person_2"

Looking at this example, it could actually be as simple as modifying the existing line:

https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl/blob/ac5bcf02a67ff0afe7868c6d77b61674daa2795e/src/getters.jl#L606

to this:

Define(:age_group => Fun.case(age_arr...), "Unspecified") |>
TheCedarPrince commented 7 months ago

Otherwise, we get missings in the resulting dataframe for the age label which can just be a pain to work with:

10×4 DataFrame
 Row │ age_group  gender_concept_id  race_concept_id  count
     │ String?    Int32?             Int32?           Int64
─────┼──────────────────────────────────────────────────────
   1 │ 50 - 59                 8532             8527    403
   2 │ 40 - 49                 8532             8515     51
   3 │ 30 - 39                 8507             8516     20
   4 │ 50 - 59                 8507             8527    445
   5 │ 40 - 49                 8532             8516     59
   6 │ missing                 8532             8527    139
   7 │ 60 - 69                 8532             8516     43
   8 │ 50 - 59                 8532             8516     69
   9 │ 80 - 89                 8532             8527    102
  10 │ 30 - 39                 8507             8527    114

See above table for example.