facebookresearch / balance

The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
https://import-balance.org
GNU General Public License v2.0
681 stars 40 forks source link

[BUG] rake doesn't support trimming - but also doesn't indicate it to the user #69

Open talgalili opened 7 months ago

talgalili commented 7 months ago

This is violating user expectation.

E.g.:

weights_untrimmed = sample.adjust(
    variables=weighting_variables,
    method="rake",
    weight_trimming_mean_ratio=0,
    transformations=auto_recodes,
)

By expectation violation I mean,

  1. there isn’t anything in the docs that suggests that rake doesn’t support weight_trimming_mean_ratio or weight_trimming_percentile
  2. there isn’t an error thrown when a value is passed to either weight_trimming_mean_ratio or weight_trimming_percentile suggesting that rake doesn’t support it.
  3. this can result people thinking the weights are getting trimmed when they aren’t.

(Reported by David Lovis-McMahon)

talgalili commented 7 months ago

This issue also happens for post stratification. The locations of the functions to fix:

https://github.com/facebookresearch/balance/blob/0081b51a39783cfb245e3077927c96fcf0b3ffb0/balance/weighting_methods/rake.py#L26 https://github.com/facebookresearch/balance/blob/0081b51a39783cfb245e3077927c96fcf0b3ffb0/balance/weighting_methods/poststratify.py#L20

We might need to add it here: https://github.com/facebookresearch/balance/blob/0081b51a39783cfb245e3077927c96fcf0b3ffb0/balance/weighting_methods/rake.py#L243

As opposed to how it's solved for, say, CBPS: https://github.com/facebookresearch/balance/blob/0081b51a39783cfb245e3077927c96fcf0b3ffb0/balance/weighting_methods/cbps.py#L676 And ipw https://github.com/facebookresearch/balance/blob/0081b51a39783cfb245e3077927c96fcf0b3ffb0/balance/weighting_methods/ipw.py#L178

Since there is always also the stage of normalizing the sum of weights of target population, it might be more easily solved as a new function that trims and normalize to sum of weights of the target.