PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
20 stars 30 forks source link

Defining filing status #366

Open jdebacker opened 3 years ago

jdebacker commented 3 years ago

This issue was created to discuss the best approaches to defining filing status. taxdata already does a reasonable job of defining non-filers and filers that is helpful in appending non-filers to the PUF using CPS data. However, Tax-Calculator makes no use of the filer variable (and it has been renamed to data_source). @donboyd5 has recently been trying to impute filing status for the purposes of validating some Tax-Calculator output and has run into some issues in accurately defining filer status (see Tax-Calculator Issue #2501).

As I try to get my head around the issue of defining filer status, I wanted to outline where filer status is (or might be) useful in taxdata and Tax-Calculator and ask some questions I have about the issue.

Where knowing filing status is (or could be) useful:

  1. To create a puf.csv file that represents both filers and non-filers. (Note: filing status is currently used this way in taxdata.).
  2. To age the base data as some targets may represent only the population of tax filers. (Note: filing status is currently used this way in taxdata)
  3. In Tax-Calculator output (?) e.g., It seems like (1) changes in the number of filers might be an important result to look at for a policy reform and (2) it maybe that some aggregates depend on knowing who is/isn't a filer. (Note: filing status is currently not used in any way in Tax-Calculator)

Why determining filing status is difficult:

  1. There are somewhat specific rules for who is required to file, but everyone has the option to file. Thus, determining voluntary filers from data like the CPS is uncertain (presumably all filers in the PUF chose to file, either because they were required to or otherwise chose to do so). taxdata takes a partially probabilistic approach to identifying these filers (see here), while @donboyd5 's approach here takes a deterministic view of assigning likely filers as filers.
  2. Thresholds for determining filing status change over time under current law (e.g., because of adjustments to the nominal dollar amount that determines the income threshold for filing). While the change in this threshold over time is known (and parameterized through 2018 in filing_rules.json) and thus can be accounted for, I'm not sure if this is done when the base data are aged or extrapolated. Also, if filing status for records is allowed to change over time, aging can become complicated in that there can potentially be interactions between the determination of filing status, weighting, and growth factors in the targeting process when aging data.
  3. If one wants to look at output from Tax-Calculator that depends on filing status (see (3) above), then one needs to know how changes in policy parameters affect filing status.

Some questions:

  1. @andersonfrailey: Does taxdata assume that once a household is identified as a filer/non-filer in the base year data that it retains that status forever (e.g., even if the blowup factors push gross income over the filing threshold for some future year)? My read is that it is constant, but I could be missing something.
  2. @MattHJensen Can you envision users of Tax-Calculator caring about filing status?
  3. @MattHJensen Should some Tax-Calculator output depend on filing status (e.g., does TC currently compute some tax liability for non-filers?)?
  4. @andersonfrailey When aging/extrapolating data it seems that one would need to account for changes in tax law since the year of the base data file through present (e.g., because of changes in definitions of income/deduction items, rate changes, etc.). Would you agree? How does taxdata handle that? Or are all targeted moments independent of tax law?
donboyd5 commented 3 years ago

My few thoughts:

On your item 4 it seems to me the proper sequence is:

MattHJensen commented 3 years ago

@MattHJensen Can you envision users of Tax-Calculator caring about filing status?

@jdebacker, I'm not sure what you mean here, but let me try to answer the best I can and please let me know if this isn't what you are looking for:

It seems many or most Tax-Calculator users should care about filing status indirectly, such as in the preparation of their input data. I can also see why policymakers might care about how policy reforms influence the number of filers and similarly the number of taxpayers required to file. As a policy process observer, though, I've seen much more attention paid to the number of taxpayers with positive IIT or IIT+FICA liability, a related but distinct concept. This came up, for instance, in Mitt Romney's 47% remarks.

@MattHJensen Should some Tax-Calculator output depend on filing status (e.g., does TC currently compute some tax liability for non-filers?)?

Tax-Calculator computes liabilities for any tax records given to it and then includes all records in its output. I think this is the right thing to do.

taxdata takes a partially probabilistic approach to identifying these filers

Note that taxdata has distinct approaches for identifying filers on the PUF and the CPS. The quote here is true for identifying which CPS records are filers or non filers for the purpose of whether they should be matched to PUF records or added to the file without a PUF match. All PUF-derived records, however, are assumed to be filers in every year.

On your item 4 it seems to me the proper sequence is:

start with a given year - let's say 2014 to be concrete and let's assume that the steps below have already been done for earlier years - calculate tax law for 2014, calculate filer status for that year under whatever rules you have for that year

...

Don's sequence makes great sense to me. To follow it exactly, though, I think we'll need to add 2011 and 2012 law to Tax-Calculator or buy a dataset from 2013 or later.

andersonfrailey commented 3 years ago

Does taxdata assume that once a household is identified as a filer/non-filer in the base year data that it retains that status forever (e.g., even if the blowup factors push gross income over the filing threshold for some future year)? My read is that it is constant, but I could be missing something.

@jdebacker yes. Once taxdata has labeled a household as filer/non-filer, they keep that status forever. I'd be interested in maybe integrating tax-calc into our extrapolation routine to calculate taxable income in each year and then use that to determine who would be required to file and who may be doing so voluntarily.

When aging/extrapolating data it seems that one would need to account for changes in tax law since the year of the base data file through present (e.g., because of changes in definitions of income/deduction items, rate changes, etc.). Would you agree? How does taxdata handle that? Or are all targeted moments independent of tax law?

taxdata doesn't worry about changes in tax law when extrapolating. We kind of outsource that to the CBO projections that are used to calculate growth factors. So if, for example, what counts as capital gains changes, we'll just assume that CBO will bake that into their total capital gains projections and it'll show up in our growth rates.