Bayer-Group / pybalance

A library for minimizing the effects of confounding covariates
BSD 3-Clause "New" or "Revised" License
11 stars 0 forks source link

population as a mandatory column is unclear #21

Open abhishek-ch opened 1 week ago

abhishek-ch commented 1 week ago

A necessity of population named column is not logical for every usecase, make it data driven

sprivite commented 1 week ago

Can you give me a use case in which population is not needed?

abhishek-ch commented 1 week ago

If I create dummy dataset for impact of lifestyle choices like Smoking, Exercise, Blood pressure level etc, the target column seems to be Treatment or Is_Patient etc, I couldn't related population column for such scenario

Dataset Sample

image

sprivite commented 1 week ago

You're saying you want a different name for the column?

abhishek-ch commented 1 week ago

Will it be possible/logical to make any column Populations, I can always make sure to have a column name called population but not sure abt the value

sprivite commented 1 week ago

Does this solve your issue?

https://bayer-group.github.io/pybalance/03_api.html#pybalance.utils.MatchingData

Note that init can take population_col as an argument.