PolicyEngine / policyengine-uk

The UK's only open-source static tax-benefit microsimulation model.
https://policyengine.github.io/policyengine-uk/
GNU Affero General Public License v3.0
29 stars 27 forks source link

Adjust FRS weights to match administrative statistics #504

Closed nikhilwoodruff closed 2 years ago

nikhilwoodruff commented 2 years ago

The FRS has a well-documented problem with benefit reporting and high incomes: all benefits are under-reported, as well as income sources such as dividends. Up to now, we've operated under the assumption that this is due to measurement error: that recipients are giving incorrect information about their income. However, over time I think we've discovered enough evidence to suggest it is actually mostly sampling error:

After some initial trials, I propose an optimisation-based approach at re-weighting, using TensorFlow to optimise a weight adjustment vector in order to minimise statistical error across a range of validation statistics, penalising substantial divergence from initial weights. Essentially, we want a balance between modified weights that move us as close as possible to benefit and tax statistics, while not moving too far from the initial weights.

Initial experimentation

The following graph illustrates the trade-off between weight edits and statistical error: image

And here's an example result from one specific benefit and metric. We can get closer or further, depending on the modification penalty: image

Process outline

We'll aim to match the following targets:

cc @MaxGhenis

MaxGhenis commented 2 years ago

Some additional administrative totals to consider:

nikhilwoodruff commented 2 years ago

Thanks, got all but the last two. I can't seem to find a source for the 5-year age bins in years other than 2020, and I am wondering whether we should include them given they don't uprate directly upwards but mostly along in future years.