Open MaxGhenis opened 7 months ago
FWIW I only saw one income level where the spline was obviously nonmonotonic, so might not be such a high priority:
https://policyengine-uk-documentation.nw.r.appspot.com/Capital_Gains_Tax
The PCHIP Interpolator seems ideally suited to this. It both preserves monotonicity and supports extrapolation.
Here's an example for the 99th income centile, which the spline currently produces a nonmonotonic interpolation from.
Relevant code (notebook):
pchip_interpolator = PchipInterpolator(quantiles, gains, extrapolate=True)
extended_quantiles = np.linspace(0.01, 0.99, 99)
extended_gains = pchip_interpolator(extended_quantiles)
impute_capital_gains
currently interpolates/extrapolates the provided quantiles to a CDF by fitting splines. This can result in CDFs that are not monotonically increasing and thus invalid.After asking ChatGPT for some ideas, I think a promising approach could be first synthesizing a pdf from the quantiles, smoothing it with a kernel density estimator, then integrating it to a cdf. Here's an example of how that might look:
Other options like isotonic regression or transformations could also work, and we may want something more complex if we want to consider all the data together rather than each income group independently.