Open jdebacker opened 3 years ago
My few thoughts:
I think all of the reasons you give for having filing status in taxdata are important; I do think people will care about how a reform changes the number of filers
I thought the required/likely deterministic approach worked well
it seems to me you file because:
I think categories 1 and 2 should be treated as deterministic although I guess you might be able to find some data on the proportion and maybe distribution of required filers who do not do so and people who do file based on an erroneous understanding of the rules but it doesn't seem to me like it is worth a lot of effort. I treated them as deterministic. The only issue I ran into was that I didn't have the time/bandwidth to define the Social Security component of gross income for filing purposes perfectly, although I think I defined it well. I experimented with variants of what I did and the variants had miniscule impacts on the numbers of filers. I think it is worth being perfect if someone has the time.
I am quite sure that some people in categories 3 and 5 for whom it might make tax-financial sense to file do not do so even though they could for several reasons, including (a) fear of immigration issues, and (b) de minimis potential refund or credit not worth the effort. I am not sure how you get information on the distribution of these people in order to do it probabilistically. I treated 3 as deterministic - if Tax-Calculator says you're eligible, then you file. I treated 5 as deterministic but with a de minimis wage threshold - having wages would only drive you to file if you had at least $1k of wages. Obviously this was an arbitrary judgment. This certainly could be done better - by examining withholding tables for example, and making a better arbitrary judgement on the amount of withholding paid at different wage levels and using that to help decide how to set a wage threshold (basing it on wages in the data rather than withholding in the data on the assumption that we don't think 2011 withholding provides any actual useful information for years we care about and besides, we don't have it for the universe of people most likely to be brought in - created from the CPS). There also might be external information from UI or CPS data that would provide insight into the number of low wage workers/families in those data vs. the number of low wage taxpayers in SOI data that might be helpful in doing this probabilistically. For my purposes I was satisfied with deterministic.
We know there are people in categories 4 and 6 but I ignored them.
On your item 4 it seems to me the proper sequence is:
start with a given year - let's say 2014 to be concrete and let's assume that the steps below have already been done for earlier years - calculate tax law for 2014, calculate filer status for that year under whatever rules you have for that year
decide upon growfactors in moving from 2014 to 2015; presumably some are informed by SOI data which by definition is only available for filers, and other data which may only be available for everyone; for convenience/lack of data you may end up having the same growfactors for everyone - I would expect that - but I suppose that would not be necessary; it would require knowledge and confidence that are hard to have
grow the dollar values, etc. to 2015
calculate tax for 2015 under rules for that year, calculate filers under rules for that year
repeat until done, meaning you have an unweighted file for each year with tax law for that year and filers for that year
Now develop weights for each year, weighting the filer records so that they approximate what we know in great detail (until the end of published data) about filers, and weighting nonfilers to what we know about them, which probably is far less detail
Presumably (I would advocate) this would be done differently for different years - use the best data you have for targets, until you run out of published targets, and for later years weight to hit a far smaller set of aggregates and impose some distributions (a combination of theoretical and estimated empirical) on the results for the pure forecast years. You might not make the filer/non-filer distinction in pure forecast years.
@MattHJensen Can you envision users of Tax-Calculator caring about filing status?
@jdebacker, I'm not sure what you mean here, but let me try to answer the best I can and please let me know if this isn't what you are looking for:
It seems many or most Tax-Calculator users should care about filing status indirectly, such as in the preparation of their input data. I can also see why policymakers might care about how policy reforms influence the number of filers and similarly the number of taxpayers required to file. As a policy process observer, though, I've seen much more attention paid to the number of taxpayers with positive IIT or IIT+FICA liability, a related but distinct concept. This came up, for instance, in Mitt Romney's 47% remarks.
@MattHJensen Should some Tax-Calculator output depend on filing status (e.g., does TC currently compute some tax liability for non-filers?)?
Tax-Calculator computes liabilities for any tax records given to it and then includes all records in its output. I think this is the right thing to do.
taxdata takes a partially probabilistic approach to identifying these filers
Note that taxdata has distinct approaches for identifying filers on the PUF and the CPS. The quote here is true for identifying which CPS records are filers or non filers for the purpose of whether they should be matched to PUF records or added to the file without a PUF match. All PUF-derived records, however, are assumed to be filers in every year.
On your item 4 it seems to me the proper sequence is:
start with a given year - let's say 2014 to be concrete and let's assume that the steps below have already been done for earlier years - calculate tax law for 2014, calculate filer status for that year under whatever rules you have for that year
...
Don's sequence makes great sense to me. To follow it exactly, though, I think we'll need to add 2011 and 2012 law to Tax-Calculator or buy a dataset from 2013 or later.
Does taxdata assume that once a household is identified as a filer/non-filer in the base year data that it retains that status forever (e.g., even if the blowup factors push gross income over the filing threshold for some future year)? My read is that it is constant, but I could be missing something.
@jdebacker yes. Once taxdata
has labeled a household as filer/non-filer, they keep that status forever. I'd be interested in maybe integrating tax-calc into our extrapolation routine to calculate taxable income in each year and then use that to determine who would be required to file and who may be doing so voluntarily.
When aging/extrapolating data it seems that one would need to account for changes in tax law since the year of the base data file through present (e.g., because of changes in definitions of income/deduction items, rate changes, etc.). Would you agree? How does taxdata handle that? Or are all targeted moments independent of tax law?
taxdata
doesn't worry about changes in tax law when extrapolating. We kind of outsource that to the CBO projections that are used to calculate growth factors. So if, for example, what counts as capital gains changes, we'll just assume that CBO will bake that into their total capital gains projections and it'll show up in our growth rates.
This issue was created to discuss the best approaches to defining filing status.
taxdata
already does a reasonable job of defining non-filers and filers that is helpful in appending non-filers to the PUF using CPS data. However, Tax-Calculator makes no use of thefiler
variable (and it has been renamed todata_source
). @donboyd5 has recently been trying to impute filing status for the purposes of validating some Tax-Calculator output and has run into some issues in accurately defining filer status (see Tax-Calculator Issue #2501).As I try to get my head around the issue of defining filer status, I wanted to outline where filer status is (or might be) useful in
taxdata
and Tax-Calculator and ask some questions I have about the issue.Where knowing filing status is (or could be) useful:
puf.csv
file that represents both filers and non-filers. (Note: filing status is currently used this way intaxdata
.).taxdata
)Why determining filing status is difficult:
taxdata
takes a partially probabilistic approach to identifying these filers (see here), while @donboyd5 's approach here takes a deterministic view of assigning likely filers as filers.filing_rules.json
) and thus can be accounted for, I'm not sure if this is done when the base data are aged or extrapolated. Also, if filing status for records is allowed to change over time, aging can become complicated in that there can potentially be interactions between the determination of filing status, weighting, and growth factors in the targeting process when aging data.Some questions:
taxdata
assume that once a household is identified as a filer/non-filer in the base year data that it retains that status forever (e.g., even if the blowup factors push gross income over the filing threshold for some future year)? My read is that it is constant, but I could be missing something.taxdata
handle that? Or are all targeted moments independent of tax law?