JuliaHealth / OMOPCDMCohortCreator.jl

Create cohorts from databases utilizing the OMOP CDM
https://juliahealth.org/OMOPCDMCohortCreator.jl/stable
Other
8 stars 7 forks source link

[FEATURE] Update Implicit Function APIs to Accept and/or Mutate a DataFrame #53

Closed TheCedarPrince closed 10 months ago

TheCedarPrince commented 1 year ago

This has been a feature I have been thinking about for a while wanting. I think all species of functions within this package should be able to accept a DataFrame and, depending on the function, know how to index that DataFrame to automatically retrieve information required. Additionally, functions maybe should automatically join results onto a passed in DataFrame.

The reason for these changes is that I often want to use the pattern:

using Chain
using DataFrames
using OMOPCDMCohortCreator

@chain patient_df begin
  GetPatientGender
  GetPatientRace
  GetPatientAgeGroup
  _[:, Not(:person_id)]
  groupby(_, names(_))
end

or even

Characterize(x) = (GetPatientGender ∘ GetPatientRace ∘ GetPatientAgeGroup)(x)

To do very quick, rapid analyses and to re-use analyses over and over again clearly and explicitly. Not sure how much of the API should change as a result of this fix but would lend itself much better to composed functions and composition.

TheCedarPrince commented 12 months ago

Here is what we discussed in our call:


# Existing dispatch of working with person ids
GetPatientGender([1, 2, 3], conn)

# Issue idea
using DataFrames 
df = DataFrame([1, 2, 3], cols = [:person_id])

GetPatientGender(df, conn)

# Dispatch function "knows" what column it is expecting to see from the DataFrame
function GetPatientGender(df::DataFrame, conn; ...)
  ids = df.person_id
  conn = conn

  # DataFrame with two columns: person_id, gender_concept_id
  # This is the a new DataFrame returned from the dispatch call
  new_df = GetPatientGender(ids, conn)

  # With this part, try this out for one or two functions
  df = outerjoin(df, new_df, on = [:person_id => :person_id])

  # DataFrame with two columns: person_id, gender_concept_id
  # This is the original DataFrame that was passed into the function
  # but has been updated (mutated) by the function itself
  return df

end

Let me know if you have any questions -- thanks!

TheCedarPrince commented 10 months ago

Closed by #54