PSLmodels / tax-microdata-benchmarking

A project to develop a benchmarked general-purpose dataset for tax reform impact analysis.
https://pslmodels.github.io/tax-microdata-benchmarking/
2 stars 6 forks source link

Solve for ratios of new weights to original weights, rather than directly solving for new weights #188

Closed donboyd5 closed 2 months ago

donboyd5 commented 2 months ago

This change addresses two challenges I was finding with lsq_linear: (1) the new weights sometimes appeared implausibly large, e.g., thousands of times as large as the original, and (2) my efforts to constrain weights by setting upper bounds on new weights were making the problem difficult for lsq_linear to solve, taking thousands of iterations and several minutes and still leading to large differences from targets, whereas problems without upper bounds were solving in a dozen iterations and less than a second.

It is common in reweighting efforts to solve for the ratio of new weights to original weights, rather than solving for new weights directly (even though in concept they can lead to the same result) for at least two reasons: (1) many reweighting efforts (e.g., JCT and taxdata) seek to minimize changes in weights and thus penalize weight changes by penalizing the ratio of new to original weights, which makes sense for national efforts although it makes less sense when constructing subnational files from a national file, and (2) the problem often seems more stable, numerically, when the x variable being solved for is centered near 1 (a ratio), rather than ranging from close to zero to possibly many thousands (a weight).

This PR:

In examining the results of this PR, in comparison to attempting to set bounds on new weights directly, I have found that:

Thus, preliminary results are attractive. I think as we move from hypothetical problems to real-world problems, we will have to keep our options open. We will undoubtedly encounter new issues and we may have reason to revisit the question of whether to solve for weights or ratios, and the question of whether to use a dedicated least-squares solver such or lsq_linear or a more-general solver such as L-BFGS-B.

donboyd5 commented 2 months ago

@martinholmer

My PR is failing with some sort of bad credentials error (see below). I am guessing it is related to access to the 2015 PUF, which I have locally of course. I don't know how to fix this. Advice?

image