JuliaHealth / OMOPCDMCohortCreator.jl

Create cohorts from databases utilizing the OMOP CDM
https://juliahealth.org/OMOPCDMCohortCreator.jl/stable
Other
8 stars 7 forks source link

[BUG] Error with `outerJoin` in Getting Cohort Dispatch #77

Open TheCedarPrince opened 4 months ago

TheCedarPrince commented 4 months ago

I had seen this bug a few times and thought maybe I was just "using it wrong", but it just dawned on me that there is actually an error here as the outerjoin should also join on the :subject_id variable or else there will result duplicate column name errors.

"""
function GetCohortSubjectStartDate(df:DataFrame, conn; tab = cohort)

Given a `DataFrame` with a `:cohort_definition_id` column and `:subject_id` column, return the `DataFrame` with an associated `:cohort_start_date` corresponding to a cohort's subject ID in the `DataFrame`

Multiple dispatch that accepts all other arguments like in `GetCohortSubjectStartDate(ids, conn; tab = cohort)`
"""
function GetCohortSubjectStartDate(
    df::DataFrame, 
    conn; 
    tab = cohort
)

    return outerjoin(GetCohortSubjectStartDate(df[:,"cohort_definition_id"], df[:,"subject_id"], conn; tab=tab), df, on = :cohort_definition_id)

end

@jay-sanjay, I am not sure how we missed this with the tests... Did we not have a test that accounted for a dataframe with both cohort_definition_id and subject_id? I guess I am just surprised we missed this; ah well!

Jay-sanjay commented 4 months ago

Hi, @TheCedarPrince I guess that's strange, because I think this should have done that part , right ? https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl/blob/f78c77e782c5ca8e74a32a2113a27a9227c09018/test/sqlite/getters.jl#L493-L494

TheCedarPrince commented 4 months ago

Weird!!! Are you able to see the error too @Jay-sanjay ? Let me see if I can get a code example shortly so you can see what I am seeing.