Compare true and synthetic PUFs to administrative totals (targets)

MaxGhenis commented 5 years ago

When available, I think the administrative totals are the best thing to compare synthetic data against, and this will give us an idea of how the error compares to the true PUF's error. I'd rank this above secondary targets calculated from the PUF, curious what others think.

donboyd5 commented 5 years ago

I basically agree. However, there are two calculated variables of great importance: AGI (c00100) and some sort of total tax variable. We can think of AGI as an administrative variable that is a bit too complex to calculate in the synthesis step, so we get it from Tax-Calculator. Because tax results are so important to us, I think that once we go to the trouble of running Tax-Calculator, we should look at a tax variable as the marginal cost of doing that is near zero. I have been using tax before credit (taxbc). I like to be able to look at the distribution of the weighted variables against AGI.

I will post some preliminary results that show this momentarily.

The administrative totals are particularly good for diagnostics. They tell us where the file is going wrong. That provides useful information for improving the file.

However, AFTER we get a file that does well on administrative totals, we would hope it will do well on our end objective, which is tax policy analysis. At that point I think runs of tax reform are particularly important. If we do a really good job of hitting administrative totals, the file should do well on reforms, but we still want to see, because we won't think of every important administrative total to look at, at least initially, and even if we do, we may not be able to hit every one of them and we'd like to know, then, how much that weakness is costing us in policy-analysis usefulness.

donboyd5 commented 5 years ago

I am putting analysis results in a public Google Drive directory as they have no information that could be construed as confidential: https://drive.google.com/drive/folders/1eOZPrWjmmvanyxT51zGSMktqs4Pqo8-g?usp=sharing. These are from eval_2018-12-14.html. I'll create an issue with this link so that people can easily find it.

Here are some very preliminary results using c00100.

CAUTION: because this was proof(to myself)-of-concept, I only used a few variables in Tax-Calculator so AGI does not reflect some very important income components yet. It includes wages, interest, dividends, and business income, but not capital gains, partnership income, or a host of other items. As a result, these graphs are just illustrative, and grade the synpuf's on a curve (sorry). Based on @feenberg's tax reform results, I expect they will not look as good as I get more variables into the tax calculation. Still, I think they are useful and will become more useful when I put all relevant variables into AGI.

I do agree (anticipating a comment) that it is useful to look at true CDFs as well - each synthesized variable against its own true (puf) cumulative distribution, sometimes cut by marital status or other variables. We'll get there. It will be a few days.

Here is weighted wages against agi:

And here is weighted taxbc against agi (CAUTION, again - this does not include full information for tax calculation so it is an easy test).

donboyd5 commented 5 years ago

All right, now there are a few CDFs of weighted values in synthesized files vs. admin totals in eval_2018-12-14.html. They appear in the output BEFORE we get to Tax-Calculator, which is where they belong (while I hate to say it, you are right). I may hold off on running Tax-Calculator in later versions until the file settles down more to our liking, as it adds a few minutes of running time.

Several CDFs look fairly good but several look very bad. Here is interest income (bad):

All will look worse when we drop synpuf2 from the analysis, as in the graphs where synpuf4 looks good synpuf2 draws the eyes away from the smaller differences.

I'll update to do all continuous variables (probably not today). Later we'll add cuts by marital status. I'll think about some tables that would be worthwhile.

Next step is to upload the R project to github so that @andersonfrailey can start taking a whack at it.

donboyd5 commented 5 years ago

The file eval_excl_taxcalc_2018-12-14.html has CDFs of weighted values for all continuous variables. I haven't thought about what to do about negative variables.

donboyd5 commented 5 years ago

I have pushed the repo EvaluateWtdSynFile to https://github.com/donboyd5/EvaluateWtdSynFile.

This is an R project that (mostly) examines the weighted synthetic files in comparison to the weighted true PUF. It also does some exploratory analysis on unweighted variables and exploratory analysis of tax calculations. It writes an html file to a publicly viewable Google Drive folder.

I have cleaned it up a little so it should be easier for @andersonfrailey to make sense of, but you might still want to have a phone conversation with me after you take a look through it.

I updated it to put variable descriptions on CDF graph titles, so if @MaxGhenis looks at the updated eval_excl_taxcalc_2018-12-14.html, the graphs will be easier to understand.

MaxGhenis commented 5 years ago

Thanks Don, but could we move comparisons between true and synthetic PUFs to a separate issue? This one can be for comparing to administrative totals.

MaxGhenis commented 5 years ago

Maybe I didn't explain correctly, I'm interested in comparison to SOI totals, which include AGI. So AGI being a Tax-Calculator-calculated variable doesn't pose a problem.

donboyd5 commented 5 years ago

I still don't quite understand what you mean by administrative totals or SOI totals. If by SOI totals, you mean "true" totals, then by design a synthesized file will not hit those totals because the PUF itself will not hit those totals. Here is p.45 from the PUF 2011 documentation (it is part of a multi-page table). The SOI totals from their "best" data are often different from PUF totals by 0-3%. That is a problem, but I don't think it is a synthesis problem. I think we should be trying in synthesis to produce a PUF. We then (as OSPC does now) have a separate problem of bringing the synthesized or actual PUF into alignment with better data about the real world. I don't think we should try to solve that here, and I'm not sure if you're proposing that we do. Can you elaborate?

MaxGhenis commented 5 years ago

Yes I mean SOI totals, and the true PUF's inability to match the totals is why I think this is interesting, since it gives us a benchmark for errors. I'd argue a 3% difference between synthesis and PUF is worse for E00100 (where the PUF is off by 0.13%) than it is for E00800 (where the PUF is off by 13%), especially if that 3% error gets the synthesis closer to the SOI total.

This is in the vein of @feenberg's proposal to incorporate standard errors into comparisons between the synthesis and the PUF.

donboyd5 commented 5 years ago

Now I understand your point. Happy to put comparisons between synthesized and true PUf in a separate issue. I'll open an issue.

I certainly agree that we should be less worried about large differences between synthesized and PUF for measures where the PUF is bad than for where the PUF is good.

In terms of targeting, though, I think of the initial targeting task as being one of hitting targets constructed from PUF rather than targets taken from SOI totals. If we skip that step, then we won't have a file that tries to mimic the PUF. That will mean we can't then compare tax results calculated from a synthesized non-PUF to those from a true PUF.

After we do that, and part of the larger taxdata project, it is important to hit better measures of the real world.

I don't think you are saying we should skip the PUF-targeting step, though, are you? I think you are saying information on the difference between PUF and SOI is useful in thinking about how much to worry about differences from PUF targets. Is that correct?

MaxGhenis commented 5 years ago

If we had to choose, I think hitting SOI totals is more important than matching the PUF's sometimes-wrongness. We need both types of targets, but when SOI totals are available, I don't see as much value also comparing against the corresponding PUF total.

Matching against a metric unknown to the model is weird right now, but this will be more relevant when we do more with the weights, potentially calibrating them to SOI totals, or at least targeting some level there. IMO we may as well define the evaluation criteria with that end state in mind.

feenberg commented 5 years ago

I think it would be best to defer any attempt to match the "truth" (published aggregates) till after we are satisfied we can match the PUF.

Dan

On Fri, 14 Dec 2018, Don Boyd wrote:

I still don't quite understand what you mean by administrative totals or SOI totals. If by SOI totals, you mean "true" totals, then by design a synthesized file will not hit those totals because the PUF itself will not hit those totals. Here is p.45 from the PUF 2011 documentation (it is part of a multi-page table). The SOI totals from their "best" data are often different from PUF totals by 0-3%. That is a problem, but I don't think it is a synthesis problem. I think we should be trying in synthesis to produce a PUF. We then (as OSPC does now) have a separate problem of bringing the synthesized or actual PUF into alignment with better data about the real world. I don't think we should try to solve that here, and I'm not sure if you're proposing that we do. Can you elaborate?

image

I think of that as a separate problem: the PUF values don't match aggregates available from better data, namely the IRS full sample, which we do not have access to. Here is p.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.[AHvQVajQ_Jb-KULgyp0LfsdjBtK8zXxsks5u4_rggaJpZM4ZSxq4.gif]

donboyd5 commented 5 years ago

Yes, I agree that first we must try to match the PUF so that we can evaluate how well the synthetic PUF compares to true PUF on tax calculations and tax reform calculations.

But wholeheartedly agree that subsequent focus needs to be on matching truth. Defining truth is more complicated than we might think, though, because of the different summary files SOI has available.

MaxGhenis commented 5 years ago

OK but if you're putting work into defining targets from the PUF, wouldn't it make sense to define targets that don't have SOI totals? AFAIUI this will be a larger share of total targets to match, and then it's less duplication of work when we later compare against SOI totals.

There are other ways we're quantifying PUF-matching, e.g. distance metrics, so total-matching against PUF totals that also have SOI totals doesn't seem like the most important task.

donboyd5 commented 5 years ago

They're all related so I think we need to take a comprehensive approach when we try to hit SOI totals. For example, we have SOI totals for AGI, but we can't possibly try to hit those without adjusting components of income, so at that point those will have to change, too. We have several sources of SOI data and need to think through what to try to hit and how, and I think @andersonfrailey and others have spent time on that. I think what we want is a PUF lookalike here, and then later we run either true PUF or lookalike PUF through a PUF-to-SOI routine that hits targets. Since a lot of this has been done, I think, in taxdata, I think it makes sense to be coordinated with that.

We have great distributional detail we can pull from the PUF that will be important to try to come close to -- for example, capital gains by marital status and by income range within marital status; number of returns with gains, and average gain, by income range; and number with loss, and average loss. We want to take advantage of that richness, which will not be available to us in SOI totals.

All that said, I certainly would worry less about hitting the PUF value for E04250 than for E00200, per table above.

donboyd5 / synpuf

Compare true and synthetic PUFs to administrative totals (targets) #24