Proper mapping of PUF e18400 and e18500 to IRS-reported SALT aggregates

donboyd5 commented 4 months ago

@nikhilwoodruff, FYI @martinholmer

As mentioned on the phone today, I think tax-microdatabenchmarking currently maps the PUF SALT variable e18400 incorrectly to IRS-published total SALT rather than to the combined PIT-sales tax SALT component. However it is more complicated than I realized at the time; here are full details.

The PUF 2015 documentation (p.8) lists relevant variables as follows:

On page A-7 it shows the variables on 2015 Schedule A:

e18400 is combined PIT and sales tax, while e18500 is real estate taxes.

On page 48 the PUF 2015 documentation gives us ERRONEOUS NUMBERS FOR (1) the official IRS totals for these variables in the column 2015 Full SOI Individual Sample ($35.270 billion and $18.861 billion), and (2) what we should expect for weighted sums in the 2015 PUF, including the 4 aggregate records in the column 2015 Public Use Sample (rounding to the same numbers as the full sample). These numbers are wrong and appear to be missing a digit.

The correct numbers for the Full Sample are in the IRS file 15in14ar.xls, in BT10 and BY10, which are $352.701 billion and $188.606 billion (the PUF 2015 documentation appears to have several other missing-digit errors for other itemized deductions, so caveat emptor):

This tells us the proper mapping and approximately what we should expect if we summarize the 2015 PUF and we don't exclude the 4 aggregate records.

Now that we know the proper mapping, we can worry about how to target the PUF in 2021. Here are the IRS published totals for 2021, from 21in14ar.xls:

So we should target 2021 values for filers for e18400 of $258.640 billion and for e18500, $99.984 billion.

I have updated my IRS targets repo to include the distributional detail. The data folder includes an updated potential_targets.csv file with separate variables id_pitgst (itemized deductions PIT and general sales tax), idretax (real estate tax), and their number-of-returns counterparts nret....

donboyd5 commented 4 months ago

@nikhilwoodruff

For completeness, here's the linkage between puf variables and the variable names I've given to IRS targets in the "potential_targets.csv" file:

e18400 -- maps to id_pitgst (itemized deductions for personal income tax and general sales tax)
e18500 -- maps to id_retax (itemized deductions for real estate tax)

And also, the number of returns with each item (pseudocode):

(e18400 ~= 0) * s006 -- maps to nret_id_pitgst
(e18500 ~= 0) * s006 -- maps to nret_id_retax

MaxGhenis commented 4 months ago

Have you reported these inaccuracies to SOI?

donboyd5 commented 4 months ago

No, I should. I'll get around to it, but if you talk to them first, please feel free to let them know.

nikhilwoodruff commented 4 months ago

In #130 - E18400 in our TMD file sums to 234bn (/258bn, 10% too low) and E18500 sums to 97bn (/99bn, on target).

martinholmer commented 4 months ago

@nikhilwoodruff said is issue #118:

In https://github.com/PSLmodels/tax-microdata-benchmarking/pull/130 - E18400 in our TMD file sums to 234bn (/258bn, 10% too low) and E18500 sums to 97bn (/99bn, on target).

Apparently, the combined SALT sum of $331B (=234+97) is when using a reweighting penalty of 0.2 (the default in PR #130).

When there is no penalty (on the master branch), we get these results after the reweighting:

>C weighted puf SALT ($B)= 422.957 [?]
>C weighted puf SALT (#M)= 16.658 [14.3...27.1]

So, 423 is quite a bit more than 331.

donboyd5 commented 4 months ago

@martinholmer, can you elaborate?

The way I read your comment is that the 0.2 penalty improved the variables we input into Tax-Calculator, which is (I think) a good thing. If I understood what @nikhilwoodruff said, our tax expenditure estimate is still far off with these improved input values, which means we need to look for other reasons for being off - including the distribution of the uncapped SALT deduction, the numbers of itemizers, which is affected by many things, and the amounts of other itemized deductions.

Is that how you interpret it?

martinholmer commented 4 months ago

@donboyd5 asked in issue #118:

Is that how you interpret it?

I'm not sure; this is a complicated topic.

While the penalty weight of 0.2 may help the SALT variables, it pretty much ruins many other tax expenditures.

The one thing I feel strongly about is that when the reweighting moves things around by alot, it is likely a sign that the underlying data a not ideal. Like we found in the QBID case with some of the business income variables being misdefined.

Looking more into the SALT-related variables themselves, and how they are constructed, seems to be a more fruitful approach than experimenting with different reweighting penalty values.

martinholmer commented 4 months ago

I think issue #118 has been resolved by the changes in PR #130, which were cherry-picked into the merged PR #144.

donboyd5 commented 4 months ago

I agree. Thank you.

On Tue, Jul 16, 2024 at 9:56 AM Martin Holmer @.***> wrote:

I think issue #118 https://github.com/PSLmodels/tax-microdata-benchmarking/issues/118 has been resolved by the changes in PR #130 https://github.com/PSLmodels/tax-microdata-benchmarking/pull/130, which were cherry-picked into the merged PR #144 https://github.com/PSLmodels/tax-microdata-benchmarking/pull/144.

— Reply to this email directly, view it on GitHub https://github.com/PSLmodels/tax-microdata-benchmarking/issues/118#issuecomment-2230958288, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR4JGA3VFCP6RA7T4EKG3LZMUQ7DAVCNFSM6AAAAABKKJM2HGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZQHE2TQMRYHA . You are receiving this because you were mentioned.Message ID: @.***>

PSLmodels / tax-microdata-benchmarking

Proper mapping of PUF e18400 and e18500 to IRS-reported SALT aggregates #118