Of possible interest to @andersonfrailey @MattHJensen @rickecon @jdebacker:
Been trying to figure out why some of my numbers for 2017 differ substantially from official TaxData & Tax-Calculator so I looked at wage data for 2017. As noted in the title, I think TaxData is dropping $121 billion of wages of millionaires, which would help explain why I get such very different distributions of income and other items vs. TaxData. Here are key facts and inferences:
I can see that taxdata puf stage 2 targets for wages by income range appear to come from the same source as mine: the IRS file "17in14ar.xls" (17=2017, in=individual income tax, 14=Table 1.4, ar=all returns -- landing page).
For example, here is a doctored screenshot of the stage 2 targets that just shows wages, just for 2015-2017:
And here is a screenshot of the wage portion of IRS Table 1.4:
If you look at the first 2017 wage target labeled as "Wages and Salaries: Zero or Less" it matches the IRS "no adjusted gross income" value of $20.869 billion. So far so good, although the target label implies that it is based on wage bins but the IRS data are based on AGI bins.
(Before going further, let me note that this raises a side issue: the IRS data list values by 2017 AGI range, for tax filers. When I retarget the puf, I first calculate AGI for the year in question, then determine filing status, and apply targets to filers in corresponding AGI bins. Unless TaxData calculates AGI before targeting - I don't think it does but I need to read the code - it must be targeting for all records (not just filers) and more importantly, based on some different definition of AGI. I seem to recall it might use the base year of 2011, which would be far away in time from 2017. If that's how it's done, it would seem to result in substantially incorrect wage distributions for 2017 after AGI is calculated. But that's not the question I'm writing about here.)
The second target ($1 < $10k) of $86.507 billion matches the sum of the next 2 IRS AGI ranges, which cover $1-10k, so that looks good. And the next few ranges that I looked at also look good. For example, the $500k-1m range target and corresponding IRS value both are $379.376 billion.
But when we get to millionaires, we have a problem, as far as I can tell. TaxData shows a target of $367.732 billion. Let me zoom in on that here:
However, the IRS sum for millionaires is $489.1 billion, a full $121 billion greater. The screenshot below zooms in on the IRS data and calculates the sum for millionaires, and the sum for those at $1.5+ million, leaving out the $1-1.5 million group.
As you can see, the puf targets appear to have dropped the $121 billion in wages for those with 2017 AGI of $1-1.5 million. A quick scan across the millionaire row suggests this happened in all years.
Assuming I did my work correctly, this means that puf.csv as reweighted and grown generally will have too little wages for millionaires even if targeting controls to the overall level of wages, and some distributional oddities as records are reweighted in an effort to hit wage targets that are too low for millionaires.
In recent work I did to construct state weights for the puf, I created tables that compared the PUF to IRS values for many values, at 2017 levels, just for records that pass my 2017 filing screen which is based on IRS rules and some inferences. Here is the table for salaries and wages. The target column is the IRS value (in dollars, sorry about the specious precision), puf is the latest puf.csv using all weight and growth defaults, diff is puf minus target, and pdiff is diff as % of target. As you can see after default growth and weighting, the puf has wages that are quite close to 2017 IRS total wages (-1.6%), but about $44.5 billion too little wages for $10-millionaires. Furthermore, if you look across the ranges you can see a lot of maldistribution.
The next table shows the number of filer returns with nonzero wages according to the IRS and according to the puf (using my filer rule). You can see that the number of returns with wages also gets pushed around considerably, but with some substantial differences from how the amount of wages varies, so that average wages in some AGi bins may be quite far off as well.
This last table shows the distribution of 2017 AGI for 2017 filers for the puf grown to 2017 with defaults, compared to 2017 IRS values. As you can see, the puf has $265.9 billion too little AGI for $10-millionaires, which is only slightly offset by positive values for poorer millionaires. There could be a lot of reasons for this that go beyond the topic of this note, but it does suggest, not surprisingly, that other income sources are correlated with wages, and that as the targeting attempts to hit too-low wage targets for millionaires, it brings along records with too-low values for other income sources, too.
Because the 2017 distribution will be the jumping-off point for the distribution in later years, wages and other income will be maldistributed in the forecast and presumably too low at the high end. This is likely to lead to significantly incorrect estimates of tax reforms that raise or lower tax rates on very high-income taxpayers.
If I made a mistake in examining this I would much appreciate an early alert, but I have checked it over several times.
HI @donboyd5, thanks for digging into this! I think you're right that this is a bug, likely introduced when I tried automating the SOI estimates. I'll get a fix up and some checks to keep it from happening again.
Of possible interest to @andersonfrailey @MattHJensen @rickecon @jdebacker:
Been trying to figure out why some of my numbers for 2017 differ substantially from official TaxData & Tax-Calculator so I looked at wage data for 2017. As noted in the title, I think TaxData is dropping $121 billion of wages of millionaires, which would help explain why I get such very different distributions of income and other items vs. TaxData. Here are key facts and inferences:
I can see that taxdata puf stage 2 targets for wages by income range appear to come from the same source as mine: the IRS file "17in14ar.xls" (17=2017, in=individual income tax, 14=Table 1.4, ar=all returns -- landing page).
For example, here is a doctored screenshot of the stage 2 targets that just shows wages, just for 2015-2017:
And here is a screenshot of the wage portion of IRS Table 1.4:
If you look at the first 2017 wage target labeled as "Wages and Salaries: Zero or Less" it matches the IRS "no adjusted gross income" value of $20.869 billion. So far so good, although the target label implies that it is based on wage bins but the IRS data are based on AGI bins.
(Before going further, let me note that this raises a side issue: the IRS data list values by 2017 AGI range, for tax filers. When I retarget the puf, I first calculate AGI for the year in question, then determine filing status, and apply targets to filers in corresponding AGI bins. Unless TaxData calculates AGI before targeting - I don't think it does but I need to read the code - it must be targeting for all records (not just filers) and more importantly, based on some different definition of AGI. I seem to recall it might use the base year of 2011, which would be far away in time from 2017. If that's how it's done, it would seem to result in substantially incorrect wage distributions for 2017 after AGI is calculated. But that's not the question I'm writing about here.)
The second target ($1 < $10k) of $86.507 billion matches the sum of the next 2 IRS AGI ranges, which cover $1-10k, so that looks good. And the next few ranges that I looked at also look good. For example, the $500k-1m range target and corresponding IRS value both are $379.376 billion.
But when we get to millionaires, we have a problem, as far as I can tell. TaxData shows a target of $367.732 billion. Let me zoom in on that here:
As you can see, the puf targets appear to have dropped the $121 billion in wages for those with 2017 AGI of $1-1.5 million. A quick scan across the millionaire row suggests this happened in all years.
Assuming I did my work correctly, this means that puf.csv as reweighted and grown generally will have too little wages for millionaires even if targeting controls to the overall level of wages, and some distributional oddities as records are reweighted in an effort to hit wage targets that are too low for millionaires.
In recent work I did to construct state weights for the puf, I created tables that compared the PUF to IRS values for many values, at 2017 levels, just for records that pass my 2017 filing screen which is based on IRS rules and some inferences. Here is the table for salaries and wages. The target column is the IRS value (in dollars, sorry about the specious precision), puf is the latest puf.csv using all weight and growth defaults, diff is puf minus target, and pdiff is diff as % of target. As you can see after default growth and weighting, the puf has wages that are quite close to 2017 IRS total wages (-1.6%), but about $44.5 billion too little wages for $10-millionaires. Furthermore, if you look across the ranges you can see a lot of maldistribution.
If I made a mistake in examining this I would much appreciate an early alert, but I have checked it over several times.