PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
20 stars 30 forks source link

Don't require FDED as input #306

Closed MaxGhenis closed 5 years ago

MaxGhenis commented 5 years ago

taxdata currently requires the FDED field (itemized vs standard deduction). This has spurred discussion in the synthetic PUF project, since it means we have to synthesize it. Seems like it would be better to infer which deduction a tax unit takes based on which minimizes their tax liability, either in taxdata or (probably) Tax-Calculator.

image

https://github.com/PSLmodels/taxdata/blob/1f5f317e37b41a233efd75fb94f436f923b3e2d1/puf_data/finalprep.py#L53

andersonfrailey commented 5 years ago

Thanks for the suggestions, @MaxGhenis. This seems reasonable. @martinholmer, since you handled the imputations that use FDED I'd also like to hear your thoughts on it.

martinholmer commented 5 years ago

In taxdata issue #306, @andersonfrailey asked @martinholmer:

since you handled the imputations that use FDED, I'd also like to hear your thoughts on [removing FDED].

I'm confused. The imputations of itemized expense amounts for those who do not itemize seems to not use the FDED variable. Did I miss something?

However the construction of the cmbtp variable, which predates my participation in the Tax-Calculator and taxdata projects, does require the FDED variable. So, no @MaxGhenis, the FDED variable cannot be dropped out of a synthetic PUF.

andersonfrailey commented 5 years ago

@martinholmer

I'm confused. The imputations of itemized expense amounts for those who do not itemize seems to not use the FDED variable. Did I miss something?

Ah sorry I mixed up the construction of cmbtp with the itemized imputations. I thought you had worked on the former. My mistake.

andersonfrailey commented 5 years ago

Also, I think @MaxGhenis is suggesting we use Tax-Calc to determine whether or not someone filed as an itemizer or took the standard deduction to essentially create FDED in TaxData rather than relying on FDED from the file. As I think about it more, we couldn't use Tax-Calculator for this because the PUF is from 2011 and the earliest policy year we have in Tax-Calc is 2013.

That being said, if memory serves, records in the PUF who took the standard deduction will have zero values for all itemized deduction variables. We could then determine who itemized based on whether or not they have all zero values for the various deductions. However, this wouldn't work with the synthetic PUF if itemized deductions are synthesized for everyone.

martinholmer commented 5 years ago

@andersonfrailey said:

I mixed up the construction of cmbtp with the itemized imputations. I thought you had worked on the former. My mistake.

No problem. But the important point is that FDED cannot be dropped from the PUF file used as input into taxdata logic. If those working on the synthetic PUF can figure out a way to calculate cmbtp in taxdata without having the FDED variable available, then they can submit a taxdata pull request that does that. If they can do that, we would be able to drop the use of the FDED variable.

@MaxGhenis @donboyd5

MaxGhenis commented 5 years ago

Also, I think @MaxGhenis is suggesting we use Tax-Calc to determine whether or not someone filed as an itemizer or took the standard deduction to essentially create FDED in TaxData rather than relying on FDED from the file. As I think about it more, we couldn't use Tax-Calculator for this because the PUF is from 2011 and the earliest policy year we have in Tax-Calc is 2013.

I don't know the full mechanics of how FDED is used in taxdata, but why would a tax unit's deduction choice as of 2011 be relevant for tax analysis starting in 2013? Fundamentally it seems like this logic can be pushed to Tax-Calculator, including possibly the cmbtp field if that's the blocker.

cc @feenberg

MaxGhenis commented 5 years ago

We discussed this on yesterday's call and concluded that FDED remains necessary for backing out the cmbtp field. Thanks @martinholmer for the explanation, we'll synthesize it.