Closed MaxGhenis closed 5 years ago
Thanks for the suggestions, @MaxGhenis. This seems reasonable. @martinholmer, since you handled the imputations that use FDED
I'd also like to hear your thoughts on it.
In taxdata issue #306, @andersonfrailey asked @martinholmer:
since you handled the imputations that use
FDED
, I'd also like to hear your thoughts on [removing FDED].
I'm confused. The imputations of itemized expense amounts for those who do not itemize seems to not use the FDED
variable. Did I miss something?
However the construction of the cmbtp
variable, which predates my participation in the Tax-Calculator and taxdata projects, does require the FDED
variable. So, no @MaxGhenis, the FDED
variable cannot be dropped out of a synthetic PUF.
@martinholmer
I'm confused. The imputations of itemized expense amounts for those who do not itemize seems to not use the
FDED
variable. Did I miss something?
Ah sorry I mixed up the construction of cmbtp
with the itemized imputations. I thought you had worked on the former. My mistake.
Also, I think @MaxGhenis is suggesting we use Tax-Calc to determine whether or not someone filed as an itemizer or took the standard deduction to essentially create FDED
in TaxData rather than relying on FDED
from the file. As I think about it more, we couldn't use Tax-Calculator for this because the PUF is from 2011 and the earliest policy year we have in Tax-Calc is 2013.
That being said, if memory serves, records in the PUF who took the standard deduction will have zero values for all itemized deduction variables. We could then determine who itemized based on whether or not they have all zero values for the various deductions. However, this wouldn't work with the synthetic PUF if itemized deductions are synthesized for everyone.
@andersonfrailey said:
I mixed up the construction of
cmbtp
with the itemized imputations. I thought you had worked on the former. My mistake.
No problem. But the important point is that FDED
cannot be dropped from the PUF file used as input into taxdata logic. If those working on the synthetic PUF can figure out a way to calculate cmbtp
in taxdata without having the FDED
variable available, then they can submit a taxdata pull request that does that. If they can do that, we would be able to drop the use of the FDED
variable.
@MaxGhenis @donboyd5
Also, I think @MaxGhenis is suggesting we use Tax-Calc to determine whether or not someone filed as an itemizer or took the standard deduction to essentially create
FDED
in TaxData rather than relying onFDED
from the file. As I think about it more, we couldn't use Tax-Calculator for this because the PUF is from 2011 and the earliest policy year we have in Tax-Calc is 2013.
I don't know the full mechanics of how FDED
is used in taxdata, but why would a tax unit's deduction choice as of 2011 be relevant for tax analysis starting in 2013? Fundamentally it seems like this logic can be pushed to Tax-Calculator, including possibly the cmbtp
field if that's the blocker.
cc @feenberg
We discussed this on yesterday's call and concluded that FDED
remains necessary for backing out the cmbtp
field. Thanks @martinholmer for the explanation, we'll synthesize it.
taxdata currently requires the
FDED
field (itemized vs standard deduction). This has spurred discussion in the synthetic PUF project, since it means we have to synthesize it. Seems like it would be better to infer which deduction a tax unit takes based on which minimizes their tax liability, either in taxdata or (probably) Tax-Calculator.https://github.com/PSLmodels/taxdata/blob/1f5f317e37b41a233efd75fb94f436f923b3e2d1/puf_data/finalprep.py#L53