PSLmodels / tax-microdata-benchmarking

A project to develop a benchmarked general-purpose dataset for tax reform impact analysis.
https://pslmodels.github.io/tax-microdata-benchmarking/
2 stars 6 forks source link

Question: How can we define "taxpayers" using tax-calculator output data in the same way that IRS defines them in Publication 1304 and associated tables? #37

Closed donboyd5 closed 2 months ago

donboyd5 commented 5 months ago

@martinholmer @nikhilwoodruff

It's important for us to be able to define "taxpayers" using Tax-Calculator output variables in the same way that the IRS defines taxpayers:

IRS defines taxpayers, for purposes of its published tables, using a concept it calls "Total Income Tax". See Publication 1304 for 2020, pp.320-321. (Publication 1304 pdf for 2021 is not yet available although 2021 spreadsheets are.) See full definitions further below, including the discussion of "Taxable and Nontaxable Returns".

I'd like to match, or come close to matching, the IRS definition of taxable returns using Tax-Calculator output variables, without doing a crazy amount of work.

As I read the IRS definitions, it seems to me a taxable return is one that has a positive amount of total income tax (not including payroll taxes and not including taxes relating to prior years), after allowing for all credits, including refundable credits. Does that seem like a correct reading of the IRS definitions?

If so, how best to match this to tax-calculator output variables? When I read the definition for iitax in the tax-calculator user guide in conjunction with the discussion here based on 2014-2016 tax returns, it seems to get at approximately the right tax before credits concept. It seems like I should then subtract non-refundable credits (c07100) to get a tax-after-credits concept similar to IRS Total Income Tax. Then, I should look to see whether refundable credits (refund) would reduce that to <= zero. After all this, records with Total Income Tax ~= iitax - c07100 <= 0 would be nontaxpayers, as would records where Total Income Tax - refund <= 0.

Does that seem right? Is there a better or simpler way to construct a definition of taxpayer using Tax-Calculator output?

IRS Definitions from Publication 1304, 2020, pp.320-321)

Taxable and Nontaxable Returns

The taxable and nontaxable classification of a return for this report is determined by the presence of “total income tax.” Some returns classified as “nontaxable” may have had a liability for other taxes, such as excess advance premium tax credit (APTC) repayment, self-employment tax, uncollected employee Social Security and Medicare tax on tips, tax from recomputing prior-year investment credit, penalty taxes on individual retirement accounts, section 72 penalty taxes, household employment taxes, Additional Medicare Taxes, or golden parachute payments. These taxes, however, were disregarded for the purposes of this classification, since four of the above taxes were considered Social Security (rather than income) taxes, and the remaining ones either were based on prior year’s income or were penalty taxes. The APTC repayment was not an income tax but a repayment of money previously advanced to taxpayers for paying for health insurance purchased on a health care exchange. Net Investment Income Tax from Form 8960 was added to income tax after credits to create income tax.

For this report, the earned income credit, additional child credit, American opportunity credit, premium tax credit, regulated investment company credit, and health coverage credit are treated first as an amount used to offset income tax before credits. Since they were refundable, they were subtracted from income tax (for the statistics) after reduction by all other statutory credits. As a result, some returns became nontaxable strictly because of the refundable credits when the refundable credits equaled or exceeded income tax before credits reduced by any other credits.

Total Income Tax

(line 24, Form 1040—any excess advance premium tax credit (APTC) repayment on line 2, Schedule 2, + any Net Investment Income Tax on line 8b, Schedule 2, + any Form 4970 tax on line 8c, Schedule 2 - line 27, Form 1040 - line 28, Form 1040 - line 29, Form 1040 - line 12a, Schedule 3 - line 12c, Schedule 3 - portion on line 12d, Schedule 3, related to tax on repatriated income, limited to zero.)

“Total income tax” was the sum of income tax after credits (including the subtraction of the excess APTC repayment, earned income credit, additional child tax credit, American opportunity credit, regulated investment company credit, health coverage credit, recovery rebate credit, and qualified sick and family leave credit) less any deferred tax on repatriated (965) income plus the Net Investment Income Tax from Form 8960 and the tax from Form 4970. It did not include any of the other taxes that made up total tax liability. Total income tax was the basis for classifying returns as taxable or nontaxable.

MaxGhenis commented 5 months ago

That sounds right based on information I collected in https://github.com/PolicyEngine/policyengine-us/discussions/4283

donboyd5 commented 5 months ago

Thanks, @MaxGhenis ! BTW, sorry I never responded to PolicyEngine/policyengine-us#4283. I think that's a smart idea and is the reason for this issue. I know from looking at the data and from talking to @nikhilwoodruff that if we only target taxable returns we lose some detailed AGI bins that are available for filers but not for taxpayers, but IRS Table 1.1 has details for certain variables for taxpayers for the same income bins as for filers, so some of what we need can be had for detailed income ranges for high income taxpayers, not just filers. If you are looking for a computer-friendly data set with IRS aggregates for 2015 and 2021 for Tables 1.1, 1.2, 1.4, and 2.1, you can find it here. If you want it as a csv file, it's in the data folder of the repo that created the web page. I only put some of the data from Table 1.1 in the web page and csv file but can go back and get more; it's pretty automated.

martinholmer commented 2 months ago

GIven the finding, in PR #148, that using "taxable return" statistics in TMD reweighting causes substantial distortions in the post-reweighted sample, do you want to add a comment and/or close issue #37 (which you raised in early April)?

donboyd5 commented 2 months ago

For two reasons, using IRS-published aggregates for taxable returns as diagnostic information and as potential targets has not worked out as we had hoped. First, we have not yet deciphered precisely how to define taxable returns, using PUF variables, in a way that conforms to the IRS definition in published tables. This may be resolvable, but it would need more focused effort on our part than we currently have time for. Second, in very low income ranges, few records are available for the definition of taxable returns we have been using. This puts a lot of pressure on those records for targeting, particularly if they differ in meaningful ways from the records in the IRS SOI full sample that is used to create published tables. This second reason might or might not be resolvable.

With more time, it would make sense to return to this question but we have higher priorities at the moment.