Closed nikhilwoodruff closed 2 years ago
Some additional administrative totals to consider:
Thanks, got all but the last two. I can't seem to find a source for the 5-year age bins in years other than 2020, and I am wondering whether we should include them given they don't uprate directly upwards but mostly along in future years.
The FRS has a well-documented problem with benefit reporting and high incomes: all benefits are under-reported, as well as income sources such as dividends. Up to now, we've operated under the assumption that this is due to measurement error: that recipients are giving incorrect information about their income. However, over time I think we've discovered enough evidence to suggest it is actually mostly sampling error:
After some initial trials, I propose an optimisation-based approach at re-weighting, using TensorFlow to optimise a weight adjustment vector in order to minimise statistical error across a range of validation statistics, penalising substantial divergence from initial weights. Essentially, we want a balance between modified weights that move us as close as possible to benefit and tax statistics, while not moving too far from the initial weights.
Initial experimentation
The following graph illustrates the trade-off between weight edits and statistical error:
And here's an example result from one specific benefit and metric. We can get closer or further, depending on the modification penalty:
Process outline
We'll aim to match the following targets:
cc @MaxGhenis