[FEATURE] Add a warning message to Balance when trying to run very large/imbalanced weights

facebookresearch / balance

The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.

https://import-balance.org

GNU General Public License v2.0

688 stars 42 forks source link

[FEATURE] Add a warning message to Balance when trying to run very large/imbalanced weights #84

Open talgalili opened 4 months ago

talgalili commented 4 months ago

Add a warning message to Balance when trying to run very large/imbalanced weights (say, anything more than like 100k cases and population frame that's 10x the sample). The thinking here is that, if the population is >10x the sample, then basically all the standard error in comparing (sample vs population) is coming from the sample rather than the population. It’s comparable to a 1-sample t-test rather than a 2-sample t-test.

Idea from: Ben Mainwaring

talgalili commented 4 months ago

TODO (thoughts):

Decide on a ratio of target/sample (it should probably also depend on the model matrix, might be worth finding some good role of thumb from the literature).
Maybe propose a user to do the sampling for them as an argument? (with control of the seed)
Should the warning be during the loading of the data, or when running adjust, or both?