Closed martinholmer closed 6 years ago
Thanks for pointing this out @martinholmer. I dug a little deeper and all of the filing units where this is an issue are non-filers, which means they come from the CPS, not the PUF.
When we add non-filers to puf.csv
, we set income variables that weren't found when creating the CPS tax units to zero. In this case, e01500
is set to zero while e01700
is set to the value of pensions found in the CPS. You can see this in these lines of code in add_nonfilers.py
. This goes back to when we were producing the PUF using SAS scripts.
This isn't an issue in cps.csv
because e01500
is set as the value of pensions in the CPS (rtm-val) and e01700
is derived from that, as can be seen in cps_data/finalprep.py.
Given that the variables used to determine e01500
in the CPS is described as
Recode total amount of retirement income received
in the documentation, I believe this is the correct way to define the two variables.
Now that you've brought this to my attention, the solution seems to be to modify the matching scripts so that e01500
is equal to the value of pensions reported in the CPS and e01700
is derived from that. Because non-filers are not used in the stage 2 process, this does not require also creating a new weights file.
To prevent further issues like this, I'll be working on a test suite for the TaxData repo that includes logical checks like the one you conducted after we finish the UBI project.
@andersonfrailey , Thanks for the analysis in #156
What’s the timing on the new puf.csv file?
We need one to fix Tax-Calculator’s expanded income statistic.
@martinholmer I'll try and get you something today.
@andersonfrailey , Thanks but no rush. I won’t be back in the office until Thursday.
@andersonfrailey, Where's the pull request that made the new puf.csv
you sent to me?
When using that new file, I get changes in AGI, which is not what I would expect if the new file simply fixed the e01500
values (which are not in AGI).
@martinholmer I also changed e01700
. Based on what we've done in the CPS file and my reading of CPS documentation, the pensions variable we use to set e01700
currently should actually be e01500
and e01700
should be a subset of that. So the changes in AGI can be attributed to that change. PR #158 shows the changes I've made.
Issue #156 has been resolved by pull request #158.
Consider this tabulation of output from
tc puf.csv 2014 --sqldb
andtc cps.csv 2014 --sqldb
.Tabulating CPS output with this SQL script generates these results:
Which is what one would expect given the definitions of
e01500
ande01700
.But the tabulation of the PUF output doesn't look right:
@andersonfrailey, why do we have over one thousand PUF filing units for whom total pensions are less than taxable pensions? These funny-looking cases are not present in the CPS data.