PolicyEngine / policyengine-uk

The UK's only open-source static tax-benefit microsimulation model.
https://policyengine.github.io/policyengine-uk/
GNU Affero General Public License v3.0
28 stars 27 forks source link

Ensure imputed capital gains CDF is valid (monotonic) #816

Open MaxGhenis opened 7 months ago

MaxGhenis commented 7 months ago

impute_capital_gains currently interpolates/extrapolates the provided quantiles to a CDF by fitting splines. This can result in CDFs that are not monotonically increasing and thus invalid.

After asking ChatGPT for some ideas, I think a promising approach could be first synthesizing a pdf from the quantiles, smoothing it with a kernel density estimator, then integrating it to a cdf. Here's an example of how that might look:

image

Other options like isotonic regression or transformations could also work, and we may want something more complex if we want to consider all the data together rather than each income group independently.

MaxGhenis commented 7 months ago

FWIW I only saw one income level where the spline was obviously nonmonotonic, so might not be such a high priority:

image

https://policyengine-uk-documentation.nw.r.appspot.com/Capital_Gains_Tax

MaxGhenis commented 7 months ago

The PCHIP Interpolator seems ideally suited to this. It both preserves monotonicity and supports extrapolation.

Here's an example for the 99th income centile, which the spline currently produces a nonmonotonic interpolation from.

image

Relevant code (notebook):

pchip_interpolator = PchipInterpolator(quantiles, gains, extrapolate=True)
extended_quantiles = np.linspace(0.01, 0.99, 99)
extended_gains = pchip_interpolator(extended_quantiles)