Intermediate representation of data?

bcbi / PreprocessMD.jl

Medically-informed data preprocessing for machine learning

MIT License

6 stars 3 forks source link

Intermediate representation of data? #4

Open AshlinHarris opened 2 years ago

AshlinHarris commented 2 years ago

Currently, pivot() relies on DataFrames.unstack(). It might be better to instead build an intermediate representation of the data, and then use that intermediate representation to construct the wide data frame.

AshlinHarris commented 2 years ago

For example, if we're trying to keep track of which patients have which medications, instead of trying to build a data frame directly, we could first build a Dict{Patient, Set{Medication}} where the keys are the patient IDs, and the value is the set of medications for that particular patient. So this Dict{Patient, Set{Medication}} would be a sort of "intermediate representation". And then we can use this dictionary later when to build the wide DataFrame.

AshlinHarris commented 2 years ago

Related: #81