Open donboyd5 opened 5 years ago
Broad approach to short-term goals -- getting base, syn, and synadj We established short-term goals (from now to ~ Feb 2019) in 12/4/2018 phone call that gave @donboyd5 lead on measuring weighted file quality, with input from full team.
I anticipate general approach as:
General approach to comparisons of base, syn, and synadj I anticipate 4 kinds of comparisons that increase in complexity and in their ability to tell us how well syn and synadj are performing:
Why do all 4 comparisons?
One possible result of test 4 is that we may learn that a constructed file is good for analyzing some kinds of reforms and not others. That would be valuable information for users.
One possible summary measure for Comparison of base year tax law
One summary measure for item 3 in the list above (Comparison of base year tax law) would be to compute the cumulative distribution of weighted total tax (e06500) vs. AGI for our three files (the PUF, the synthetic PUF with synthesized weights, and the synthetic PUF with adjusted weights), where tax and agi are obtained by running the data files through Tax-Calculator.
An exploratory look would put all 3 distributions on a graph (similar to the graph in issue #16 but with 3 lines). The comparison could be formalized with two goodness-of-fit statistics (one to compare the fit of syn to base, and one to compare the fit of synadj to base). I don't think it would be a Kolmogorov-Smirnov test because that is univariate whereas this involves 2 variables (total income tax, and AGI), but I am sure we can choose an appropriate test.
It might also make sense to do the same comparisons on some of the underlying variables that will have a strong effect on tax calculations, such as major components of income and deductions.
In #9 @feenberg wrote:
I have a program that scores 35 or so plausible tax reforms with the PUF and another file. If the alternate file is just the PUF rounded to 2 digits, the scores are very close. I'd like to try the synth file again, but the first draft gave scores that were not good. I'll try again with the next version.
@donboyd responded looking forward to seeing it.
One thought: We are not synthesizing the 4 aggregated records (#15). When you compare actual puf to synthetic puf, would be important to drop those 4 records from the actual puf.
cc: @feenberg
I think two of our most important file-quality goals should be having a file that is good for:
Are these both crucial file-quality goals?
Are there other crucial file-quality goals?
How should we operationalize measuring file quality with these goals in mind?