The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
Add a warning message to Balance when trying to run very large/imbalanced weights (say, anything more than like 100k cases and population frame that's 10x the sample).
The thinking here is that, if the population is >10x the sample, then basically all the standard error in comparing (sample vs population) is coming from the sample rather than the population. It’s comparable to a 1-sample t-test rather than a 2-sample t-test.
Decide on a ratio of target/sample (it should probably also depend on the model matrix, might be worth finding some good role of thumb from the literature).
Maybe propose a user to do the sampling for them as an argument? (with control of the seed)
Should the warning be during the loading of the data, or when running adjust, or both?
Add a warning message to Balance when trying to run very large/imbalanced weights (say, anything more than like 100k cases and population frame that's 10x the sample). The thinking here is that, if the population is >10x the sample, then basically all the standard error in comparing (sample vs population) is coming from the sample rather than the population. It’s comparable to a 1-sample t-test rather than a 2-sample t-test.
Idea from: Ben Mainwaring